Search Results: "jack"

20 December 2022

Ian Jackson: Rust for the Polyglot Programmer, December 2022 edition

I have reviewed, updated and revised my short book about the Rust programming language, Rust for the Polyglot Programmer. It now covers some language improvements from the past year (noting which versions of Rust they re available in), and has been updated for changes in the Rust library ecosystem. With (further) assistance from Mark Wooding, there is also a new table of recommendations for numerical conversion. Recap about Rust for the Polyglot Programmer There are many introductory materials about Rust. This one is rather different. Compared to much other information about Rust, Rust for the Polyglot Programmer is: After reading Rust for the Polyglot Programmer, you won t know everything you need to know to use Rust for any project, but should know where to find it. Comments are welcome of course, via the Dreamwidth comments or Salsa issue or MR. (If you re making a contribution, please indicate your agreement with the Developer Certificate of Origin.)
edited 2022-12-20 01:48 to fix a typo


comment count unavailable comments

18 December 2022

Ian Jackson: Rust needs #[throws]

tl;dr: Ok-wrapping as needed in today s Rust is a significant distraction, because there are multiple ways to do it. They are all slightly awkward in different ways, so are least-bad in different situations. You must choose a way for every fallible function, and sometimes change a function from one pattern to another. Rust really needs #[throws] as a first-class language feature. Code using #[throws] is simpler and clearer. Please try out withoutboats s fehler. I think you will like it. Contents A recent personal experience in coding style Ever since I read withoutboats s 2020 article about fehler, I have been using it in most of my personal projects. For Reasons I recently had a go at eliminating the dependency on fehler from Hippotat. So, I made a branch, deleted the dependency and imports, and started on the whack-a-mole with the compiler errors. After about a half hour of this, I was starting to feel queasy. After an hour I had decided that basically everything I was doing was making the code worse. And, bizarrely, I kept having to make individual decisons about what idiom to use in each place. I couldn t face it any more. After sleeping on the question I decided that Hippotat would be in Debian with fehler, or not at all. Happily the Debian Rust Team generously helped me out, so the answer is that fehler is now in Debian, so it s fine. For me this experience, of trying to convert Rust-with-#[throws] to Rust-without-#[throws] brought the Ok wrapping problem into sharp focus. What is Ok wrapping? Intro to Rust error handling (You can skip this section if you re already a seasoned Rust programer.) In Rust, fallibility is represented by functions that return Result<SuccessValue, Error>: this is a generic type, representing either whatever SuccessValue is (in the Ok variant of the data-bearing enum) or some Error (in the Err variant). For example, std::fs::read_to_string, which takes a filename and returns the contents of the named file, returns Result<String, std::io::Error>. This is a nice and typesafe formulation of, and generalisation of, the traditional C practice, where a function indicates in its return value whether it succeeded, and errors are indicated with an error code. Result is part of the standard library and there are convenient facilities for checking for errors, extracting successful results, and so on. In particular, Rust has the postfix ? operator, which, when applied to a Result, does one of two things: if the Result was Ok, it yields the inner successful value; if the Result was Err, it returns early from the current function, returning an Err in turn to the caller. This means you can write things like this:
    let input_data = std::fs::read_to_string(input_file)?;
and the error handling is pretty automatic. You get a compiler warning, or a type error, if you forget the ?, so you can t accidentally ignore errors. But, there is a downside. When you are returning a successful outcome from your function, you must convert it into a Result. After all, your fallible function has return type Result<SuccessValue, Error>, which is a different type to SuccessValue. So, for example, inside std::fs::read_to_string, we see this:
        let mut string = String::new();
        file.read_to_string(&mut string)?;
        Ok(string)
     
string has type String; fs::read_to_string must return Result<String, ..>, so at the end of the function we must return Ok(string). This applies to return statements, too: if you want an early successful return from a fallible function, you must write return Ok(whatever). This is particularly annoying for functions that don t actually return a nontrivial value. Normally, when you write a function that doesn t return a value you don t write the return type. The compiler interprets this as syntactic sugar for -> (), ie, that the function returns (), the empty tuple, used in Rust as a dummy value in these kind of situations. A block ( ... ) whose last statement ends in a ; has type (). So, when you fall off the end of a function, the return value is (), without you having to write it. So you simply leave out the stuff in your program about the return value, and your function doesn t have one (i.e. it returns ()). But, a function which either fails with an error, or completes successfuly without returning anything, has return type Result<(), Error>. At the end of such a function, you must explicitly provide the success value. After all, if you just fall off the end of a block, it means the block has value (), which is not of type Result<(), Error>. So the fallible function must end with Ok(()), as we see in the example for std::fs::read_to_string. A minor inconvenience, or a significant distraction? I think the need for Ok-wrapping on all success paths from fallible functions is generally regarded as just a minor inconvenience. Certainly the experienced Rust programmer gets very used to it. However, while trying to remove fehler s #[throws] from Hippotat, I noticed something that is evident in codebases using vanilla Rust (without fehler) but which goes un-remarked. There are multiple ways to write the Ok-wrapping, and the different ways are appropriate in different situations. See the following examples, all taken from a real codebase. (And it s not just me: I do all of these in different places, - when I don t have fehler available - but all these examples are from code written by others.) Idioms for Ok-wrapping - a bestiary Wrap just a returned variable binding If you have the return value in a variable, you can write Ok(reval) at the end of the function, instead of retval.
    pub fn take_until(&mut self, term: u8) -> Result<&'a [u8]>  
        // several lines of code
        Ok(result)
     
If the returned value is not already bound to variable, making a function fallible might mean choosing to bind it to a variable. Wrap a nontrivial return expression Even if it s not just a variable, you can wrap the expression which computes the returned value. This is often done if the returned value is a struct literal:
    fn take_from(r: &mut Reader<'_>) -> Result<Self>  
        // several lines of code
        Ok(AuthChallenge   challenge, methods  )
     
Introduce Ok(()) at the end For functions returning Result<()>, you can write Ok(()). This is usual, but not ubiquitous, since sometimes you can omit it. Wrap the whole body If you don t have the return value in a variable, you can wrap the whole body of the function in Ok( ). Whether this is a good idea depends on how big and complex the body is.
    fn from_str(s: &str) -> std::result::Result<Self, Self::Err>  
        Ok(match s  
            "Authority" => RelayFlags::AUTHORITY,
            // many other branches
            _ => RelayFlags::empty(),
         )
     
Omit the wrap when calling fallible sub-functions If your function wraps another function call of the same return and error type, you don t need to write the Ok at all. Instead, you can simply call the function and not apply ?. You can do this even if your function selects between a number of different sub-functions to call:
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result  
        if flags::unsafe_logging_enabled()  
            std::fmt::Display::fmt(&self.0, f)
          else  
            self.0.display_redacted(f)
         
     
But this doesn t work if the returned error type isn t the same, but needs the autoconversion implied by the ? operator. Convert a fallible sub-function error with Ok( ... ?) If the final thing a function does is chain to another fallible function, but with a different error type, the error must be converted somehow. This can be done with ?.
     fn try_from(v: i32) -> Result<Self, Error>  
         Ok(Percentage::new(v.try_into()?))
      
Convert a fallible sub-function error with .map_err Or, rarely, people solve the same problem by converting explicitly with .map_err:
     pub fn create_unbootstrapped(self) -> Result<TorClient<R>>  
         // several lines of code
         TorClient::create_inner(
             // several parameters
         )
         .map_err(ErrorDetail::into)
      
What is to be done, then? The fehler library is in excellent taste and has the answer. With fehler: fehler provides: This is precisely correct. It is very ergonomic. Consequences include: Limitations of fehler But, fehler is a Rust procedural macro, so it cannot get everything right. Sadly there are some wrinkles. But, Rust-with-#[throws] is so much nicer a language than Rust-with-mandatory-Ok-wrapping, that these are minor inconveniences. Please can we have #[throws] in the Rust language This ought to be part of the language, not a macro library. In the compiler, it would be possible to get the all the corner cases right. It would make the feature available to everyone, and it would quickly become idiomatic Rust throughout the community. It is evident from reading writings from the time, particularly those from withoutboats, that there were significant objections to automatic Ok-wrapping. It seems to have become quite political, and some folks burned out on the topic. Perhaps, now, a couple of years later, we can revisit this area and solve this problem in the language itself ? Explicitness An argument I have seen made against automatic Ok-wrapping, and, in general, against any kind of useful language affordance, is that it makes things less explicit. But this argument is fundamentally wrong for Ok-wrapping. Explicitness is not an unalloyed good. We humans have only limited attention. We need to focus that attention where it is actually needed. So explicitness is good in situtions where what is going on is unusual; or would otherwise be hard to read; or is tricky or error-prone. Generally: explicitness is good for things where we need to direct humans attention. But Ok-wrapping is ubiquitous in fallible Rust code. The compiler mechanisms and type systems almost completely defend against mistakes. All but the most novice programmer knows what s going on, and the very novice programmer doesn t need to. Rust s error handling arrangments are designed specifically so that we can avoid worrying about fallibility unless necessary except for the Ok-wrapping. Explicitness about Ok-wrapping directs our attention away from whatever other things the code is doing: it is a distraction. So, explicitness about Ok-wrapping is a bad thing. Appendix - examples showning code with Ok wrapping is worse than code using #[throws] Observe these diffs, from my abandoned attempt to remove the fehler dependency from Hippotat. I have a type alias AE for the usual error type (AE stands for anyhow::Error). In the non-#[throws] code, I end up with a type alias AR<T> for Result<T, AE>, which I think is more opaque but at least that avoids typing out -> Result< , AE> a thousand times. Some people like to have a local Result alias, but that means that the standard Result has to be referred to as StdResult or std::result::Result.
With fehler and #[throws] Vanilla Rust, Result<>, mandatory Ok-wrapping

Return value clearer, error return less wordy:
impl Parseable for Secret impl Parseable for Secret
#[throws(AE)]
fn parse(s: Option<&str>) -> Self fn parse(s: Option<&str>) -> AR<Self>
let s = s.value()?; let s = s.value()?;
if s.is_empty() throw!(anyhow!( secret value cannot be empty )) if s.is_empty() return Err(anyhow!( secret value cannot be empty ))
Secret(s.into()) Ok(Secret(s.into()))
No need to wrap whole match statement in Ok( ):
#[throws(AE)]
pub fn client<T>(&self, key: & static str, skl: SKL) -> T pub fn client<T>(&self, key: & static str, skl: SKL) -> AR<T>
where T: Parseable + Default where T: Parseable + Default
match self.end Ok(match self.end
LinkEnd::Client => self.ordinary(key, skl)?, LinkEnd::Client => self.ordinary(key, skl)?,
LinkEnd::Server => default(), LinkEnd::Server => default(),
)
Return value and Ok(()) entirely replaced by #[throws]:
impl Display for Loc impl Display for Loc
#[throws(fmt::Error)]
fn fmt(&self, f: &mut fmt::Formatter) fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result
write!(f, :? : , &self.file, self.lno)?; write!(f, :? : , &self.file, self.lno)?;
if let Some(s) = &self.section if let Some(s) = &self.section
write!(f, )?; write!(f, )?;
Ok(())
Call to write! now looks the same as in more complex case shown above:
impl Debug for Secret impl Debug for Secret
#[throws(fmt::Error)]
fn fmt(&self, f: &mut fmt::Formatter) fn fmt(&self, f: &mut fmt::Formatter)-> fmt::Result
write!(f, "Secret(***)")?; write!(f, "Secret(***)")
Much tiresome return Ok() noise removed:
impl FromStr for SectionName impl FromStr for SectionName
type Err = AE; type Err = AE;
#[throws(AE)]
fn from_str(s: &str) -> Self fn from_str(s: &str) ->AR< Self>
match s match s
COMMON => return SN::Common, COMMON => return Ok(SN::Common),
LIMIT => return SN::GlobalLimit, LIMIT => return Ok(SN::GlobalLimit),
_ => _ =>
; ;
if let Ok(n@ ServerName(_)) = s.parse() return SN::Server(n) if let Ok(n@ ServerName(_)) = s.parse() return Ok(SN::Server(n))
if let Ok(n@ ClientName(_)) = s.parse() return SN::Client(n) if let Ok(n@ ClientName(_)) = s.parse() return Ok(SN::Client(n))
if client == LIMIT return SN::ServerLimit(server) if client == LIMIT return Ok(SN::ServerLimit(server))
let client = client.parse().context( client name in link section name )?; let client = client.parse().context( client name in link section name )?;
SN::Link(LinkName server, client ) Ok(SN::Link(LinkName server, client ))
edited 2022-12-18 19:58 UTC to improve, and 2022-12-18 23:28 to fix, formatting


comment count unavailable comments

9 December 2022

Russell Coker: USB-PD and GaN

photo of 2 USB-PD chargers A recent development is cheap Gallium Nitride based power supplies that provide better efficiency in a smaller space than other technologies. Kogan recently had a special on such devices so I decided to try them out with my new Thinkpad X1 Carbon Gen 5 [1]. Google searches for power supplies for that Thinkpad included results for 30W PSUs which implies that any 30W USB-C PSU should work. I bought a 30W charger for $10 that can supply 15V/2A or 20V/1.5A on a single USB-C port or 15W on the USB-C port and 15W on the USB-2 port at the same time and expected it to work as a laptop charger. Unfortunately it didn t, I don t know whether the adverts for 30W Thinkpad PSUs were false or whether the claim of the GaN charger I bought being 30W was false, all I know is that the KDE power applet said that the PSU couldn t supply enough power. I then bought a 68W charger for $28 that can supply 20.0V/3.0A on a single USB-C port if the USB-2 port isn t used and 50W on the USB-C port if the USB-2 port is also being used. This worked well which wasn t a great surprise as I had previously run the laptop on 45W PSUs. If I connect a phone to the USB-2 port while the laptop is being charged then the laptop will be briefly cut off, presumably the voltage and current are being renegotiated when that happens. As you can see the 68W charger is significantly larger than the 30W charger, but still small enough to easily fit in a jacket pocket and smaller than a regular laptop charger. One of my uses for this will be to put it in a jacket pocket when I have my laptop in another pocket. Another use will be for charging in my car as the cables from the inverter to convert 12VDC to 240VAC takes enough space. I will probably get a ~50W USB-PD charger that connects to a car cigarette lighter socket when a GaN version of such a charger becomes available.

8 December 2022

Russell Coker: Thinkpad X1 Carbon Gen5

Gen1 Since February 2018 I have been using a Thinkpad X1 Carbon Gen1 [1] as my main laptop. Generally I ve been very happy with it, it s small and light, has good performance for web browsing etc, and with my transition to doing all compiles etc on servers it works well. When I wrote my original review I was unhappy with the keyboard, but I got used to that and found it to be reasonably good. The things that I have found as limits on it are the display resolution as 1600*900 isn t that great by modern standards (most phones are a lot higher resolution), the size (slightly too large for the pocket of my Scott e Vest [2] jacket), and the lack of USB-C. Modern laptops can charge via USB-C/Thunderbolt while also doing USB and DisplayPort video over the same cable. USB-C monitors which support charging a laptop over the same cable as used for video input are becoming common (last time I checked the Dell web site for many models of monitor there was a USB-C one that cost about $100 more). I work at a company with lots of USB-C monitors and docks so being able to use my personal laptop with the same displays when on breaks is really handy. A final problem with the Gen1 is that it has a proprietary and unusual connector for the SSD which means that a replacement SSD costs about what I paid for the entire laptop. Ever since the SSD gave a BTRFS checksum error I ve been thinking of replacing it. Choosing a Replacement The Gen5 is the first Thinkpad X1 Carbon to have USB-C. For work I had used a Gen6 which was quite nice [3]. But it didn t seem to offer much over the Gen5. So I started looking for cheap Thinkpad X1 Carbons of Gen5+. A Cheap? Gen5 In July I saw an ebay advert for a Gen5 with FullHD display for $370 or nearest offer, with the downside being that the BIOS password had been lost. I offered $330 and the seller accepted, in retrospect that was unusually cheap and should have been a clue that I needed to do further investigation. It turned out that resetting the BIOS password is unusually difficult as it s in the TPM so the system would only boot Windows. When I learned that I should have sold the laptop to someone who wanted to run Windows and bought another. Instead I followed some instructions on the Internet about entering a wrong password multiple times to get to a password recovery screen, instead the machine locked up entirely and became unusable for windows (so don t do that). Then I looked for ways of fixing the motherboard. The cheapest was $75.25 for a replacement BIOS flash chip that had a BIOS that didn t check the validity of passwords. The aim was to solder that on, set a new password (with any random text being accepted as the old password), then solder the old one back on for normal functionality. It turned out that I m not good at fine soldering, after I had hacked at it a friend diagnosed the chip and motherboard to probably both be damaged (he couldn t get it going). The end solution was that my friend found a replacement motherboard for $170 from China. This gave a total cost of $575.25 for the laptop which is more than the usual price of a Gen6 and more than I expected to pay. In the past when advocating buying second hand or refurbished laptops people would say what happens if you get one that doesn t work properly , the answer to that question is that I paid a lot less than the new cost of $2700+ for a Thinkpad X1 Carbon and got a computer that does everything I need. One of the advantages of getting a cheap laptop is that I won t be so unhappy if I happen to drop it. A Cheap Gen6 After the failed experiment with a replacement BIOS on the Gen5 I was considering selling it for scrap. So I bought a Gen6 from Australian Computer Traders via Amazon for $390 in August. The advert clearly stated that it was for a laptop with USB-C and Thunderbolt (Gen5+ features) but they shipped me a Gen4 that didn t even have USB-C. They eventually refunded me but I will try to avoid buying from them again. Finally Working The laptop I now have has a i5-6300U CPU that rates 3242 on cpubenchmark.net. My Gen1 thinkpad has a i7-3667U CPU that rates 2378 on cpubenchmark.net, note that the cpubenchmark.net people have rescaled their benchmark since my review of the Gen1 in 2018. So according to the benchmarks my latest laptop is about 36% faster for CPU operations. Not much of a difference when comparing systems manufactured in 2012 and 2017! According to the benchmarks a medium to high end recent CPU will be more than 10* faster than the one in my Gen5 laptop, but such a CPU would cost more than my laptop cost. The storage is a 256G NVMe device that can do sustained reads at 900MB/s, that s not even twice as fast as the SSD in my Gen1 laptop although NVMe is designed to perform better for small IO. It has 2*USB-C ports both of which can be used for charging, which is a significant benefit over the Gen6 I had for work in 2018 which only had one. I don t know why Lenovo made Gen6 machines that were lesser than Gen5 in such an important way. It can power my Desklab portable 4K monitor [4] but won t send a DisplayPort signal over the same USB-C cable. I don t know if this is a USB-C cable issue or some problem with the laptop recognising displays. It works nicely with Dell USB-C monitors and docks that power the laptop over the same cable as used for DisplayPort. Also the HDMI port works with 4K monitors, so at worst I could connect my Desklab monitor via a USB-C cable for power and HDMI for data. The inability to change the battery without disassembly is still a problem, but hopefully USB-C connected batteries capable of charging such a laptop will become affordable in the near future and I have had some practice at disassembling this laptop. It still has the Ethernet dongle annoyance, and of course the seller didn t include that. But USB ethernet devices are quite good and I have a few of them. In conclusion it s worth the $575.25 I paid for it and would have been even better value for money if I had been a bit smarter when buying. It meets the initial criteria of USB-C power and display and of fitting in my jacket pocket as well as being slightly better than my old laptop in every other way.

16 November 2022

Ian Jackson: Stop writing Rust linked list libraries!

tl;dr: Don t write a Rust linked list library: they are hard to do well, and usually useless. Use VecDeque, which is great. If you actually need more than VecDeque can do, use one of the handful of libraries that actually offer a significantly more useful API. If you are writing your own data structure, check if someone has done it already, and consider slotmap or generation_arena, (or maybe Rc/Arc). Contents Survey of Rust linked list libraries I have updated my Survey of Rust linked list libraries. Background In 2019 I was writing plag-mangler, a tool for planar graph layout. I needed a data structure. Naturally I looked for a library to help. I didn t find what I needed, so I wrote rc-dlist-deque. However, on the way I noticed an inordinate number of linked list libraries written in Rust. Most all of these had no real reason for existing. Even the one in the Rust standard library is useless. Results Now I have redone the survey. The results are depressing. In 2019 there were 5 libraries which, in my opinion, were largely useless. In late 2022 there are now thirteen linked list libraries that ought probably not ever to be used. And, a further eight libraries for which there are strictly superior alternatives. Many of these have the signs of projects whose authors are otherwise competent: proper documentation, extensive APIs, and so on. There is one new library which is better for some applications than those available in 2019. (I m referring to generational_token_list, which makes a plausible alternative to dlv-list which I already recommended in 2019.) Why are there so many poor Rust linked list libraries ? Linked lists and Rust do not go well together. But (and I m guessing here) I presume many people are taught in programming school that a linked list is a fundamental data structure; people are often even asked to write one as a teaching exercise. This is a bad idea in Rust. Or maybe they ve heard that writing linked lists in Rust is hard and want to prove they can do it. Double-ended queues One of the main applications for a linked list in a language like C, is a queue, where you put items in at one end, and take them out at the other. The Rust standard library has a data structure for that, VecDeque. Five of the available libraries: For these you could, and should, just use VecDeque instead. The Cursor concept A proper linked list lets you identify and hold onto an element in the middle of the list, and cheaply insert and remove elements there. Rust s ownership and borrowing rules make this awkward. One idea that people have many times reinvented and reimplemented, is to have a Cursor type, derived from the list, which is a reference to an element, and permits insertion and removal there. Eight libraries have implemented this in the obvious way. However, there is a serious API limitation: To prevent a cursor being invalidated (e.g. by deletion of the entry it points to) you can t modify the list while the cursor exists. You can only have one cursor (that can be used for modification) at a time. The practical effect of this is that you cannot retain cursors. You can make and use such a cursor for a particular operation, but you must dispose of it soon. Attempts to do otherwise will see you losing a battle with the borrow checker. If that s good enough, then you could just use a VecDeque and use array indices instead of the cursors. It s true that deleting or adding elements in the middle involves a lot of copying, but your algorithm is O(n) even with the single-cursor list libraries, because it must first walk the cursor to the desired element. Formally, I believe any algorithm using these exclusive cursors can be rewritten, in an obvious way, to simply iterate and/or copy from the start or end (as one can do with VecDeque) without changing the headline O() performance characteristics. IMO the savings available from avoiding extra copies etc. are not worth the additional dependency, unsafe code, and so on, especially as there are other ways of helping with that (e.g. boxing the individual elements). Even if you don t find that convincing, generational_token_list and dlv_list are strictly superior since they offer a more flexible and convenient API and better performance, and rely on much less unsafe code. Rustic approaches to pointers-to-and-between-nodes data structures Most of the time a VecDeque is great. But if you actually want to hold onto (perhaps many) references to the middle of the list, and later modify it through those references, you do need something more. This is a specific case of a general class of problems where the naive approach (use Rust references to the data structure nodes) doesn t work well. But there is a good solution: Keep all the nodes in an array (a Vec<Option<T>> or similar) and use the index in the array as your node reference. This is fast, and quite ergonomic, and neatly solves most of the problems. If you are concerned that bare indices might cause confusion, as newly inserted elements would reuse indices, add a per-index generation count. These approaches have been neatly packaged up in libraries like slab, slotmap, generational-arena and thunderdome. And they have been nicely applied to linked lists by the authors of generational_token_list. and dlv-list. The alternative for nodey data structures in safe Rust: Rc/Arc Of course, you can just use Rust s interior mutability and reference counting smart pointers, to directly implement the data structure of your choice. In many applications, a single-threaded data structure is fine, in which case Rc and Cell/RefCell will let you write safe code, with cheap refcount updates and runtime checks inserted to defend against unexpected aliasing, use-after-free, etc. I took this approach in rc-dlist-deque, because I wanted each node to be able to be on multiple lists. Rust s package ecosystem demonstrating software s NIH problem The Rust ecosystem is full of NIH libraries of all kinds. In my survey, there are: five good options; seven libraries which are plausible, but just not as good as the alternatives; and fourteen others. There is a whole rant I could have about how the whole software and computing community is pathologically neophilic. Often we seem to actively resist reusing ideas, let alone code; and are ignorant and dismissive of what has gone before. As a result, we keep solving the same problems, badly - making the same mistakes over and over again. In some subfields, working software, or nearly working software, is frequently replaced with something worse, maybe more than once. One aspect of this is a massive cultural bias towards rewriting rather than reusing, let alone fixing and using. Many people can come out of a degree, trained to be a programmer, and have no formal training in selecting and evaluating software; this is even though working effectively with computers requires making good use of everyone else s work. If one isn t taught these skills (when and how to search for prior art, how to choose between dependencies, and so on) one must learn it on the job. The result is usually an ad-hoc and unsystematic approach, often dominated by fashion rather than engineering. The package naming paradox The more experienced and competent programmer is aware of all the other options that exist - after all they have evaluated other choices before writing their own library. So they will call their library something like generational_token_list or vecdeque-stableix. Whereas the novice straight out of a pre-Rust programming course just thinks what they are doing is the one and only obvious thing (even though it s a poor idea) and hasn t even searched for a previous implementation. So they call their package something obvious like linked list . As a result, the most obvious names seem to refer to the least useful libraries.
Edited 2022-11-16 23:55 UTC to update numbers of libraries in various categories following updates to the survey (including updates prompted by feedback received after this post first published).


comment count unavailable comments

Antoine Beaupr : Wayland: i3 to Sway migration

I started migrating my graphical workstations to Wayland, specifically migrating from i3 to Sway. This is mostly to address serious graphics bugs in the latest Framwork laptop, but also something I felt was inevitable. The current status is that I've been able to convert my i3 configuration to Sway, and adapt my systemd startup sequence to the new environment. Screen sharing only works with Pipewire, so I also did that migration, which basically requires an upgrade to Debian bookworm to get a nice enough Pipewire release. I'm testing Wayland on my laptop, but I'm not using it as a daily driver because I first need to upgrade to Debian bookworm on my main workstation. Most irritants have been solved one way or the other. My main problem with Wayland right now is that I spent a frigging week doing the conversion: it's exciting and new, but it basically sucked the life out of all my other projects and it's distracting, and I want it to stop. The rest of this page documents why I made the switch, how it happened, and what's left to do. Hopefully it will keep you from spending as much time as I did in fixing this. TL;DR: Wayland is mostly ready. Main blockers you might find are that you need to do manual configurations, DisplayLink (multiple monitors on a single cable) doesn't work in Sway, HDR and color management are still in development. I had to install the following packages:
apt install \
    brightnessctl \
    foot \
    gammastep \
    gdm3 \
    grim slurp \
    pipewire-pulse \
    sway \
    swayidle \
    swaylock \
    wdisplays \
    wev \
    wireplumber \
    wlr-randr \
    xdg-desktop-portal-wlr
And did some of tweaks in my $HOME, mostly dealing with my esoteric systemd startup sequence, which you won't have to deal with if you are not a fan.

Why switch? I originally held back from migrating to Wayland: it seemed like a complicated endeavor hardly worth the cost. It also didn't seem actually ready. But after reading this blurb on LWN, I decided to at least document the situation here. The actual quote that convinced me it might be worth it was:
It s amazing. I have never experienced gaming on Linux that looked this smooth in my life.
... I'm not a gamer, but I do care about latency. The longer version is worth a read as well. The point here is not to bash one side or the other, or even do a thorough comparison. I start with the premise that Xorg is likely going away in the future and that I will need to adapt some day. In fact, the last major Xorg release (21.1, October 2021) is rumored to be the last ("just like the previous release...", that said, minor releases are still coming out, e.g. 21.1.4). Indeed, it seems even core Xorg people have moved on to developing Wayland, or at least Xwayland, which was spun off it its own source tree. X, or at least Xorg, in in maintenance mode and has been for years. Granted, the X Window System is getting close to forty years old at this point: it got us amazingly far for something that was designed around the time the first graphical interface. Since Mac and (especially?) Windows released theirs, they have rebuilt their graphical backends numerous times, but UNIX derivatives have stuck on Xorg this entire time, which is a testament to the design and reliability of X. (Or our incapacity at developing meaningful architectural change across the entire ecosystem, take your pick I guess.) What pushed me over the edge is that I had some pretty bad driver crashes with Xorg while screen sharing under Firefox, in Debian bookworm (around November 2022). The symptom would be that the UI would completely crash, reverting to a text-only console, while Firefox would keep running, audio and everything still working. People could still see my screen, but I couldn't, of course, let alone interact with it. All processes still running, including Xorg. (And no, sorry, I haven't reported that bug, maybe I should have, and it's actually possible it comes up again in Wayland, of course. But at first, screen sharing didn't work of course, so it's coming a much further way. After making screen sharing work, though, the bug didn't occur again, so I consider this a Xorg-specific problem until further notice.) There were also frustrating glitches in the UI, in general. I actually had to setup a compositor alongside i3 to make things bearable at all. Video playback in a window was laggy, sluggish, and out of sync. Wayland fixed all of this.

Wayland equivalents This section documents each tool I have picked as an alternative to the current Xorg tool I am using for the task at hand. It also touches on other alternatives and how the tool was configured. Note that this list is based on the series of tools I use in desktop. TODO: update desktop with the following when done, possibly moving old configs to a ?xorg archive.

Window manager: i3 sway This seems like kind of a no-brainer. Sway is around, it's feature-complete, and it's in Debian. I'm a bit worried about the "Drew DeVault community", to be honest. There's a certain aggressiveness in the community I don't like so much; at least an open hostility towards more modern UNIX tools like containers and systemd that make it hard to do my work while interacting with that community. I'm also concern about the lack of unit tests and user manual for Sway. The i3 window manager has been designed by a fellow (ex-)Debian developer I have a lot of respect for (Michael Stapelberg), partly because of i3 itself, but also working with him on other projects. Beyond the characters, i3 has a user guide, a code of conduct, and lots more documentation. It has a test suite. Sway has... manual pages, with the homepage just telling users to use man -k sway to find what they need. I don't think we need that kind of elitism in our communities, to put this bluntly. But let's put that aside: Sway is still a no-brainer. It's the easiest thing to migrate to, because it's mostly compatible with i3. I had to immediately fix those resources to get a minimal session going:
i3 Sway note
set_from_resources set no support for X resources, naturally
new_window pixel 1 default_border pixel 1 actually supported in i3 as well
That's it. All of the other changes I had to do (and there were actually a lot) were all Wayland-specific changes, not Sway-specific changes. For example, use brightnessctl instead of xbacklight to change the backlight levels. See a copy of my full sway/config for details. Other options include:
  • dwl: tiling, minimalist, dwm for Wayland, not in Debian
  • Hyprland: tiling, fancy animations, not in Debian
  • Qtile: tiling, extensible, in Python, not in Debian (1015267)
  • river: Zig, stackable, tagging, not in Debian (1006593)
  • velox: inspired by xmonad and dwm, not in Debian
  • vivarium: inspired by xmonad, not in Debian

Status bar: py3status waybar I have invested quite a bit of effort in setting up my status bar with py3status. It supports Sway directly, and did not actually require any change when migrating to Wayland. Unfortunately, I had trouble making nm-applet work. Based on this nm-applet.service, I found that you need to pass --indicator for it to show up at all. In theory, tray icon support was merged in 1.5, but in practice there are still several limitations, like icons not clickable. Also, on startup, nm-applet --indicator triggers this error in the Sway logs:
nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property  IconPixmap 
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property  AttentionIconPixmap 
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property  ItemIsMenu 
nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
... but that seems innocuous. The tray icon displays but is not clickable. Note that there is currently (November 2022) a pull request to hook up a "Tray D-Bus Menu" which, according to Reddit might fix this, or at least be somewhat relevant. If you don't see the icon, check the bar.tray_output property in the Sway config, try: tray_output *. The non-working tray was the biggest irritant in my migration. I have used nmtui to connect to new Wifi hotspots or change connection settings, but that doesn't support actions like "turn off WiFi". I eventually fixed this by switching from py3status to waybar, which was another yak horde shaving session, but ultimately, it worked.

Web browser: Firefox Firefox has had support for Wayland for a while now, with the team enabling it by default in nightlies around January 2022. It's actually not easy to figure out the state of the port, the meta bug report is still open and it's huge: it currently (Sept 2022) depends on 76 open bugs, it was opened twelve (2010) years ago, and it's still getting daily updates (mostly linking to other tickets). Firefox 106 presumably shipped with "Better screen sharing for Windows and Linux Wayland users", but I couldn't quite figure out what those were. TL;DR: echo MOZ_ENABLE_WAYLAND=1 >> ~/.config/environment.d/firefox.conf && apt install xdg-desktop-portal-wlr

How to enable it Firefox depends on this silly variable to start correctly under Wayland (otherwise it starts inside Xwayland and looks fuzzy and fails to screen share):
MOZ_ENABLE_WAYLAND=1 firefox
To make the change permanent, many recipes recommend adding this to an environment startup script:
if [ "$XDG_SESSION_TYPE" == "wayland" ]; then
    export MOZ_ENABLE_WAYLAND=1
fi
At least that's the theory. In practice, Sway doesn't actually run any startup shell script, so that can't possibly work. Furthermore, XDG_SESSION_TYPE is not actually set when starting Sway from gdm3 which I find really confusing, and I'm not the only one. So the above trick doesn't actually work, even if the environment (XDG_SESSION_TYPE) is set correctly, because we don't have conditionals in environment.d(5). (Note that systemd.environment-generator(7) do support running arbitrary commands to generate environment, but for some some do not support user-specific configuration files... Even then it may be a solution to have a conditional MOZ_ENABLE_WAYLAND environment, but I'm not sure it would work because ordering between those two isn't clear: maybe the XDG_SESSION_TYPE wouldn't be set just yet...) At first, I made this ridiculous script to workaround those issues. Really, it seems to me Firefox should just parse the XDG_SESSION_TYPE variable here... but then I realized that Firefox works fine in Xorg when the MOZ_ENABLE_WAYLAND is set. So now I just set that variable in environment.d and It Just Works :
MOZ_ENABLE_WAYLAND=1

Screen sharing Out of the box, screen sharing doesn't work until you install xdg-desktop-portal-wlr or similar (e.g. xdg-desktop-portal-gnome on GNOME). I had to reboot for the change to take effect. Without those tools, it shows the usual permission prompt with "Use operating system settings" as the only choice, but when we accept... nothing happens. After installing the portals, it actualyl works, and works well! This was tested in Debian bookworm/testing with Firefox ESR 102 and Firefox 106. Major caveat: we can only share a full screen, we can't currently share just a window. The major upside to that is that, by default, it streams only one output which is actually what I want most of the time! See the screencast compatibility for more information on what is supposed to work. This is actually a huge improvement over the situation in Xorg, where Firefox can only share a window or all monitors, which led me to use Chromium a lot for video-conferencing. With this change, in other words, I will not need Chromium for anything anymore, whoohoo! If slurp, wofi, or bemenu are installed, one of them will be used to pick the monitor to share, which effectively acts as some minimal security measure. See xdg-desktop-portal-wlr(1) for how to configure that.

Side note: Chrome fails to share a full screen I was still using Google Chrome (or, more accurately, Debian's Chromium package) for some videoconferencing. It's mainly because Chromium was the only browser which will allow me to share only one of my two monitors, which is extremely useful. To start chrome with the Wayland backend, you need to use:
chromium  -enable-features=UseOzonePlatform -ozone-platform=wayland
If it shows an ugly gray border, check the Use system title bar and borders setting. It can do some screensharing. Sharing a window and a tab seems to work, but sharing a full screen doesn't: it's all black. Maybe not ready for prime time. And since Firefox can do what I need under Wayland now, I will not need to fight with Chromium to work under Wayland:
apt purge chromium
Note that a similar fix was necessary for Signal Desktop, see this commit. Basically you need to figure out a way to pass those same flags to signal:
--enable-features=WaylandWindowDecorations --ozone-platform-hint=auto

Email: notmuch See Emacs, below.

File manager: thunar Unchanged.

News: feed2exec, gnus See Email, above, or Emacs in Editor, below.

Editor: Emacs okay-ish Emacs is being actively ported to Wayland. According to this LWN article, the first (partial, to Cairo) port was done in 2014 and a working port (to GTK3) was completed in 2021, but wasn't merged until late 2021. That is: after Emacs 28 was released (April 2022). So we'll probably need to wait for Emacs 29 to have native Wayland support in Emacs, which, in turn, is unlikely to arrive in time for the Debian bookworm freeze. There are, however, unofficial builds for both Emacs 28 and 29 provided by spwhitton which may provide native Wayland support. I tested the snapshot packages and they do not quite work well enough. First off, they completely take over the builtin Emacs they hijack the $PATH in /etc! and certain things are simply not working in my setup. For example, this hook never gets ran on startup:
(add-hook 'after-init-hook 'server-start t) 
Still, like many X11 applications, Emacs mostly works fine under Xwayland. The clipboard works as expected, for example. Scaling is a bit of an issue: fonts look fuzzy. I have heard anecdotal evidence of hard lockups with Emacs running under Xwayland as well, but haven't experienced any problem so far. I did experience a Wayland crash with the snapshot version however. TODO: look again at Wayland in Emacs 29.

Backups: borg Mostly irrelevant, as I do not use a GUI.

Color theme: srcery, redshift gammastep I am keeping Srcery as a color theme, in general. Redshift is another story: it has no support for Wayland out of the box, but it's apparently possible to apply a hack on the TTY before starting Wayland, with:
redshift -m drm -PO 3000
This tip is from the arch wiki which also has other suggestions for Wayland-based alternatives. Both KDE and GNOME have their own "red shifters", and for wlroots-based compositors, they (currently, Sept. 2022) list the following alternatives: I configured gammastep with a simple gammastep.service file associated with the sway-session.target.

Display manager: lightdm gdm3 Switched because lightdm failed to start sway:
nov 16 16:41:43 angela sway[843121]: 00:00:00.002 [ERROR] [wlr] [libseat] [common/terminal.c:162] Could not open target tty: Permission denied
Possible alternatives:

Terminal: xterm foot One of the biggest question mark in this transition was what to do about Xterm. After writing two articles about terminal emulators as a professional journalist, decades of working on the terminal, and probably using dozens of different terminal emulators, I'm still not happy with any of them. This is such a big topic that I actually have an entire blog post specifically about this. For starters, using xterm under Xwayland works well enough, although the font scaling makes things look a bit too fuzzy. I have also tried foot: it ... just works! Fonts are much crisper than Xterm and Emacs. URLs are not clickable but the URL selector (control-shift-u) is just plain awesome (think "vimperator" for the terminal). There's cool hack to jump between prompts. Copy-paste works. True colors work. The word-wrapping is excellent: it doesn't lose one byte. Emojis are nicely sized and colored. Font resize works. There's even scroll back search (control-shift-r). Foot went from a question mark to being a reason to switch to Wayland, just for this little goodie, which says a lot about the quality of that software. The selection clicks are a not quite what I would expect though. In rxvt and others, you have the following patterns:
  • single click: reset selection, or drag to select
  • double: select word
  • triple: select quotes or line
  • quadruple: select line
I particularly find the "select quotes" bit useful. It seems like foot just supports double and triple clicks, with word and line selected. You can select a rectangle with control,. It correctly extends the selection word-wise with right click if double-click was first used. One major problem with Foot is that it's a new terminal, with its own termcap entry. Support for foot was added to ncurses in the 20210731 release, which was shipped after the current Debian stable release (Debian bullseye, which ships 6.2+20201114-2). A workaround for this problem is to install the foot-terminfo package on the remote host, which is available in Debian stable. This should eventually resolve itself, as Debian bookworm has a newer version. Note that some corrections were also shipped in the 20211113 release, but that is also shipped in Debian bookworm. That said, I am almost certain I will have to revert back to xterm under Xwayland at some point in the future. Back when I was using GNOME Terminal, it would mostly work for everything until I had to use the serial console on a (HP ProCurve) network switch, which have a fancy TUI that was basically unusable there. I fully expect such problems with foot, or any other terminal than xterm, for that matter. The foot wiki has good troubleshooting instructions as well. Update: I did find one tiny thing to improve with foot, and it's the default logging level which I found pretty verbose. After discussing it with the maintainer on IRC, I submitted this patch to tweak it, which I described like this on Mastodon:
today's reason why i will go to hell when i die (TRWIWGTHWID?): a 600-word, 63 lines commit log for a one line change: https://codeberg.org/dnkl/foot/pulls/1215
It's Friday.

Launcher: rofi rofi?? rofi does not support Wayland. There was a rather disgraceful battle in the pull request that led to the creation of a fork (lbonn/rofi), so it's unclear how that will turn out. Given how relatively trivial problem space is, there is of course a profusion of options:
Tool In Debian Notes
alfred yes general launcher/assistant tool
bemenu yes, bookworm+ inspired by dmenu
cerebro no Javascript ... uh... thing
dmenu-wl no fork of dmenu, straight port to Wayland
Fuzzel ITP 982140 dmenu/drun replacement, app icon overlay
gmenu no drun replacement, with app icons
kickoff no dmenu/run replacement, fuzzy search, "snappy", history, copy-paste, Rust
krunner yes KDE's runner
mauncher no dmenu/drun replacement, math
nwg-launchers no dmenu/drun replacement, JSON config, app icons, nwg-shell project
Onagre no rofi/alfred inspired, multiple plugins, Rust
menu no dmenu/drun rewrite
Rofi (lbonn's fork) no see above
sirula no .desktop based app launcher
Ulauncher ITP 949358 generic launcher like Onagre/rofi/alfred, might be overkill
tofi yes, bookworm+ dmenu/drun replacement, C
wmenu no fork of dmenu-wl, but mostly a rewrite
Wofi yes dmenu/drun replacement, not actively maintained
yofi no dmenu/drun replacement, Rust
The above list comes partly from https://arewewaylandyet.com/ and awesome-wayland. It is likely incomplete. I have read some good things about bemenu, fuzzel, and wofi. A particularly tricky option is that my rofi password management depends on xdotool for some operations. At first, I thought this was just going to be (thankfully?) impossible, because we actually like the idea that one app cannot send keystrokes to another. But it seems there are actually alternatives to this, like wtype or ydotool, the latter which requires root access. wl-ime-type does that through the input-method-unstable-v2 protocol (sample emoji picker, but is not packaged in Debian. As it turns out, wtype just works as expected, and fixing this was basically a two-line patch. Another alternative, not in Debian, is wofi-pass. The other problem is that I actually heavily modified rofi. I use "modis" which are not actually implemented in wofi or tofi, so I'm left with reinventing those wheels from scratch or using the rofi + wayland fork... It's really too bad that fork isn't being reintegrated... For now, I'm actually still using rofi under Xwayland. The main downside is that fonts are fuzzy, but it otherwise just works. Note that wlogout could be a partial replacement (just for the "power menu").

Image viewers: geeqie ? I'm not very happy with geeqie in the first place, and I suspect the Wayland switch will just make add impossible things on top of the things I already find irritating (Geeqie doesn't support copy-pasting images). In practice, Geeqie doesn't seem to work so well under Wayland. The fonts are fuzzy and the thumbnail preview just doesn't work anymore (filed as Debian bug 1024092). It seems it also has problems with scaling. Alternatives: See also this list and that list for other list of image viewers, not necessarily ported to Wayland. TODO: pick an alternative to geeqie, nomacs would be gorgeous if it wouldn't be basically abandoned upstream (no release since 2020), has an unpatched CVE-2020-23884 since July 2020, does bad vendoring, and is in bad shape in Debian (4 minor releases behind). So for now I'm still grumpily using Geeqie.

Media player: mpv, gmpc / sublime This is basically unchanged. mpv seems to work fine under Wayland, better than Xorg on my new laptop (as mentioned in the introduction), and that before the version which improves Wayland support significantly, by bringing native Pipewire support and DMA-BUF support. gmpc is more of a problem, mainly because it is abandoned. See 2022-08-22-gmpc-alternatives for the full discussion, one of the alternatives there will likely support Wayland. Finally, I might just switch to sublime-music instead... In any case, not many changes here, thankfully.

Screensaver: xsecurelock swaylock I was previously using xss-lock and xsecurelock as a screensaver, with xscreensaver "hacks" as a backend for xsecurelock. The basic screensaver in Sway seems to be built with swayidle and swaylock. It's interesting because it's the same "split" design as xss-lock and xsecurelock. That, unfortunately, does not include the fancy "hacks" provided by xscreensaver, and that is unlikely to be implemented upstream. Other alternatives include gtklock and waylock (zig), which do not solve that problem either. It looks like swaylock-plugin, a swaylock fork, which at least attempts to solve this problem, although not directly using the real xscreensaver hacks. swaylock-effects is another attempt at this, but it only adds more effects, it doesn't delegate the image display. Other than that, maybe it's time to just let go of those funky animations and just let swaylock do it's thing, which is display a static image or just a black screen, which is fine by me. In the end, I am just using swayidle with a configuration based on the systemd integration wiki page but with additional tweaks from this service, see the resulting swayidle.service file. Interestingly, damjan also has a service for swaylock itself, although it's not clear to me what its purpose is...

Screenshot: maim grim, pubpaste I'm a heavy user of maim (and a package uploader in Debian). It looks like the direct replacement to maim (and slop) is grim (and slurp). There's also swappy which goes on top of grim and allows preview/edit of the resulting image, nice touch (not in Debian though). See also awesome-wayland screenshots for other alternatives: there are many, including X11 tools like Flameshot that also support Wayland. One key problem here was that I have my own screenshot / pastebin software which will needed an update for Wayland as well. That, thankfully, meant actually cleaning up a lot of horrible code that involved calling xterm and xmessage for user interaction. Now, pubpaste uses GTK for prompts and looks much better. (And before anyone freaks out, I already had to use GTK for proper clipboard support, so this isn't much of a stretch...)

Screen recorder: simplescreenrecorder wf-recorder In Xorg, I have used both peek or simplescreenrecorder for screen recordings. The former will work in Wayland, but has no sound support. The latter has a fork with Wayland support but it is limited and buggy ("doesn't support recording area selection and has issues with multiple screens"). It looks like wf-recorder will just do everything correctly out of the box, including audio support (with --audio, duh). It's also packaged in Debian. One has to wonder how this works while keeping the "between app security" that Wayland promises, however... Would installing such a program make my system less secure? Many other options are available, see the awesome Wayland screencasting list.

RSI: workrave nothing? Workrave has no support for Wayland. activity watch is a time tracker alternative, but is not a RSI watcher. KDE has rsiwatcher, but that's a bit too much on the heavy side for my taste. SafeEyes looks like an alternative at first, but it has many issues under Wayland (escape doesn't work, idle doesn't work, it just doesn't work really). timekpr-next could be an alternative as well, and has support for Wayland. I am also considering just abandoning workrave, even if I stick with Xorg, because it apparently introduces significant latency in the input pipeline. And besides, I've developed a pretty unhealthy alert fatigue with Workrave. I have used the program for so long that my fingers know exactly where to click to dismiss those warnings very effectively. It makes my work just more irritating, and doesn't fix the fundamental problem I have with computers.

Other apps This is a constantly changing list, of course. There's a bit of a "death by a thousand cuts" in migrating to Wayland because you realize how many things you were using are tightly bound to X.
  • .Xresources - just say goodbye to that old resource system, it was used, in my case, only for rofi, xterm, and ... Xboard!?
  • keyboard layout switcher: built-in to Sway since 2017 (PR 1505, 1.5rc2+), requires a small configuration change, see this answer as well, looks something like this command:
     swaymsg input 0:0:X11_keyboard xkb_layout de
    
    or using this config:
     input *  
         xkb_layout "ca,us"
         xkb_options "grp:sclk_toggle"
      
    
    That works refreshingly well, even better than in Xorg, I must say. swaykbdd is an alternative that supports per-window layouts (in Debian).
  • wallpaper: currently using feh, will need a replacement, TODO: figure out something that does, like feh, a random shuffle. swaybg just loads a single image, duh. oguri might be a solution, but unmaintained, used here, not in Debian. wallutils is another option, also not in Debian. For now I just don't have a wallpaper, the background is a solid gray, which is better than Xorg's default (which is whatever crap was left around a buffer by the previous collection of programs, basically)
  • notifications: currently using dunst in some places, which works well in both Xorg and Wayland, not a blocker, salut a possible alternative (not in Debian), damjan uses mako. TODO: install dunst everywhere
  • notification area: I had trouble making nm-applet work. based on this nm-applet.service, I found that you need to pass --indicator. In theory, tray icon support was merged in 1.5, but in practice there are still several limitations, like icons not clickable. On startup, nm-applet --indicator triggers this error in the Sway logs:
     nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
     nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property  IconPixmap 
     nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property  AttentionIconPixmap 
     nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property  ItemIsMenu 
     nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
    
    ... but it seems innocuous. The tray icon displays but, as stated above, is not clickable. If you don't see the icon, check the bar.tray_output property in the Sway config, try: tray_output *. Note that there is currently (November 2022) a pull request to hook up a "Tray D-Bus Menu" which, according to Reddit might fix this, or at least be somewhat relevant. This was the biggest irritant in my migration. I have used nmtui to connect to new Wifi hotspots or change connection settings, but that doesn't support actions like "turn off WiFi". I eventually fixed this by switching from py3status to waybar.
  • window switcher: in i3 I was using this bespoke i3-focus script, which doesn't work under Sway, swayr an option, not in Debian. So I put together this other bespoke hack from multiple sources, which works.
  • PDF viewer: currently using atril (which supports Wayland), could also just switch to zatura/mupdf permanently, see also calibre for a discussion on document viewers
See also this list of useful addons and this other list for other app alternatives.

More X11 / Wayland equivalents For all the tools above, it's not exactly clear what options exist in Wayland, or when they do, which one should be used. But for some basic tools, it seems the options are actually quite clear. If that's the case, they should be listed here:
X11 Wayland In Debian
arandr wdisplays yes
autorandr kanshi yes
xdotool wtype yes
xev wev yes
xlsclients swaymsg -t get_tree yes
xrandr wlr-randr yes
lswt is a more direct replacement for xlsclients but is not packaged in Debian. See also: Note that arandr and autorandr are not directly part of X. arewewaylandyet.com refers to a few alternatives. We suggest wdisplays and kanshi above (see also this service file) but wallutils can also do the autorandr stuff, apparently, and nwg-displays can do the arandr part. Neither are packaged in Debian yet. So I have tried wdisplays and it Just Works, and well. The UI even looks better and more usable than arandr, so another clean win from Wayland here. TODO: test kanshi as a autorandr replacement

Other issues

systemd integration I've had trouble getting session startup to work. This is partly because I had a kind of funky system to start my session in the first place. I used to have my whole session started from .xsession like this:
#!/bin/sh
. ~/.shenv
systemctl --user import-environment
exec systemctl --user start --wait xsession.target
But obviously, the xsession.target is not started by the Sway session. It seems to just start a default.target, which is really not what we want because we want to associate the services directly with the graphical-session.target, so that they don't start when logging in over (say) SSH. damjan on #debian-systemd showed me his sway-setup which features systemd integration. It involves starting a different session in a completely new .desktop file. That work was submitted upstream but refused on the grounds that "I'd rather not give a preference to any particular init system." Another PR was abandoned because "restarting sway does not makes sense: that kills everything". The work was therefore moved to the wiki. So. Not a great situation. The upstream wiki systemd integration suggests starting the systemd target from within Sway, which has all sorts of problems:
  • you don't get Sway logs anywhere
  • control groups are all messed up
I have done a lot of work trying to figure this out, but I remember that starting systemd from Sway didn't actually work for me: my previously configured systemd units didn't correctly start, and especially not with the right $PATH and environment. So I went down that rabbit hole and managed to correctly configure Sway to be started from the systemd --user session. I have partly followed the wiki but also picked ideas from damjan's sway-setup and xdbob's sway-services. Another option is uwsm (not in Debian). This is the config I have in .config/systemd/user/: I have also configured those services, but that's somewhat optional: You will also need at least part of my sway/config, which sends the systemd notification (because, no, Sway doesn't support any sort of readiness notification, that would be too easy). And you might like to see my swayidle-config while you're there. Finally, you need to hook this up somehow to the login manager. This is typically done with a desktop file, so drop sway-session.desktop in /usr/share/wayland-sessions and sway-user-service somewhere in your $PATH (typically /usr/bin/sway-user-service). The session then looks something like this:
$ systemd-cgls   head -101
Control group /:
-.slice
 user.slice (#472)
    user.invocation_id: bc405c6341de4e93a545bde6d7abbeec
    trusted.invocation_id: bc405c6341de4e93a545bde6d7abbeec
   user-1000.slice (#10072)
      user.invocation_id: 08f40f5c4bcd4fd6adfd27bec24e4827
      trusted.invocation_id: 08f40f5c4bcd4fd6adfd27bec24e4827
     user@1000.service   (#10156)
        user.delegate: 1
        trusted.delegate: 1
        user.invocation_id: 76bed72a1ffb41dca9bfda7bb174ef6b
        trusted.invocation_id: 76bed72a1ffb41dca9bfda7bb174ef6b
       session.slice (#10282)
         xdg-document-portal.service (#12248)
           9533 /usr/libexec/xdg-document-portal
           9542 fusermount3 -o rw,nosuid,nodev,fsname=portal,auto_unmount,subt 
         xdg-desktop-portal.service (#12211)
           9529 /usr/libexec/xdg-desktop-portal
         pipewire-pulse.service (#10778)
           6002 /usr/bin/pipewire-pulse
         wireplumber.service (#10519)
           5944 /usr/bin/wireplumber
         gvfs-daemon.service (#10667)
           5960 /usr/libexec/gvfsd
         gvfs-udisks2-volume-monitor.service (#10852)
           6021 /usr/libexec/gvfs-udisks2-volume-monitor
         at-spi-dbus-bus.service (#11481)
           6210 /usr/libexec/at-spi-bus-launcher
           6216 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2 
           6450 /usr/libexec/at-spi2-registryd --use-gnome-session
         pipewire.service (#10403)
           5940 /usr/bin/pipewire
         dbus.service (#10593)
           5946 /usr/bin/dbus-daemon --session --address=systemd: --nofork --n 
       background.slice (#10324)
         tracker-miner-fs-3.service (#10741)
           6001 /usr/libexec/tracker-miner-fs-3
       app.slice (#10240)
         xdg-permission-store.service (#12285)
           9536 /usr/libexec/xdg-permission-store
         gammastep.service (#11370)
           6197 gammastep
         dunst.service (#11958)
           7460 /usr/bin/dunst
         wterminal.service (#13980)
           69100 foot --title pop-up
           69101 /bin/bash
           77660 sudo systemd-cgls
           77661 head -101
           77662 wl-copy
           77663 sudo systemd-cgls
           77664 systemd-cgls
         syncthing.service (#11995)
           7529 /usr/bin/syncthing -no-browser -no-restart -logflags=0 --verbo 
           7537 /usr/bin/syncthing -no-browser -no-restart -logflags=0 --verbo 
         dconf.service (#10704)
           5967 /usr/libexec/dconf-service
         gnome-keyring-daemon.service (#10630)
           5951 /usr/bin/gnome-keyring-daemon --foreground --components=pkcs11 
         gcr-ssh-agent.service (#10963)
           6035 /usr/libexec/gcr-ssh-agent /run/user/1000/gcr
         swayidle.service (#11444)
           6199 /usr/bin/swayidle -w
         nm-applet.service (#11407)
           6198 /usr/bin/nm-applet --indicator
         wcolortaillog.service (#11518)
           6226 foot colortaillog
           6228 /bin/sh /home/anarcat/bin/colortaillog
           6230 sudo journalctl -f
           6233 ccze -m ansi
           6235 sudo journalctl -f
           6236 journalctl -f
         afuse.service (#10889)
           6051 /usr/bin/afuse -o mount_template=sshfs -o transform_symlinks - 
         gpg-agent.service (#13547)
           51662 /usr/bin/gpg-agent --supervised
           51719 scdaemon --multi-server
         emacs.service (#10926)
            6034 /usr/bin/emacs --fg-daemon
           33203 /usr/bin/aspell -a -m -d en --encoding=utf-8
         xdg-desktop-portal-gtk.service (#12322)
           9546 /usr/libexec/xdg-desktop-portal-gtk
         xdg-desktop-portal-wlr.service (#12359)
           9555 /usr/libexec/xdg-desktop-portal-wlr
         sway.service (#11037)
           6037 /usr/bin/sway
           6181 swaybar -b bar-0
           6209 py3status
           6309 /usr/bin/i3status -c /tmp/py3status_oy4ntfnq
           6969 Xwayland :0 -rootless -terminate -core -listen 29 -listen 30 - 
       init.scope (#10198)
         5909 /lib/systemd/systemd --user
         5911 (sd-pam)
     session-7.scope (#10440)
       5895 gdm-session-worker [pam/gdm-password]
       6028 /usr/libexec/gdm-wayland-session --register-session sway-user-serv 
[...]
I think that's pretty neat.

Environment propagation At first, my terminals and rofi didn't have the right $PATH, which broke a lot of my workflow. It's hard to tell exactly how Wayland gets started or where to inject environment. This discussion suggests a few alternatives and this Debian bug report discusses this issue as well. I eventually picked environment.d(5) since I already manage my user session with systemd, and it fixes a bunch of other problems. I used to have a .shenv that I had to manually source everywhere. The only problem with that approach is that it doesn't support conditionals, but that's something that's rarely needed.

Pipewire This is a whole topic onto itself, but migrating to Wayland also involves using Pipewire if you want screen sharing to work. You can actually keep using Pulseaudio for audio, that said, but that migration is actually something I've wanted to do anyways: Pipewire's design seems much better than Pulseaudio, as it folds in JACK features which allows for pretty neat tricks. (Which I should probably show in a separate post, because this one is getting rather long.) I first tried this migration in Debian bullseye, and it didn't work very well. Ardour would fail to export tracks and I would get into weird situations where streams would just drop mid-way. A particularly funny incident is when I was in a meeting and I couldn't hear my colleagues speak anymore (but they could) and I went on blabbering on my own for a solid 5 minutes until I realized what was going on. By then, people had tried numerous ways of letting me know that something was off, including (apparently) coughing, saying "hello?", chat messages, IRC, and so on, until they just gave up and left. I suspect that was also a Pipewire bug, but it could also have been that I muted the tab by error, as I recently learned that clicking on the little tiny speaker icon on a tab mutes that tab. Since the tab itself can get pretty small when you have lots of them, it's actually quite frequently that I mistakenly mute tabs. Anyways. Point is: I already knew how to make the migration, and I had already documented how to make the change in Puppet. It's basically:
apt install pipewire pipewire-audio-client-libraries pipewire-pulse wireplumber 
Then, as a regular user:
systemctl --user daemon-reload
systemctl --user --now disable pulseaudio.service pulseaudio.socket
systemctl --user --now enable pipewire pipewire-pulse
systemctl --user mask pulseaudio
An optional (but key, IMHO) configuration you should also make is to "switch on connect", which will make your Bluetooth or USB headset automatically be the default route for audio, when connected. In ~/.config/pipewire/pipewire-pulse.conf.d/autoconnect.conf:
context.exec = [
      path = "pactl"        args = "load-module module-always-sink"  
      path = "pactl"        args = "load-module module-switch-on-connect"  
    #  path = "/usr/bin/sh"  args = "~/.config/pipewire/default.pw"  
]
See the excellent as usual Arch wiki page about Pipewire for that trick and more information about Pipewire. Note that you must not put the file in ~/.config/pipewire/pipewire.conf (or pipewire-pulse.conf, maybe) directly, as that will break your setup. If you want to add to that file, first copy the template from /usr/share/pipewire/pipewire-pulse.conf first. So far I'm happy with Pipewire in bookworm, but I've heard mixed reports from it. I have high hopes it will become the standard media server for Linux in the coming months or years, which is great because I've been (rather boldly, I admit) on the record saying I don't like PulseAudio. Rereading this now, I feel it might have been a little unfair, as "over-engineered and tries to do too many things at once" applies probably even more to Pipewire than PulseAudio (since it also handles video dispatching). That said, I think Pipewire took the right approach by implementing existing interfaces like Pulseaudio and JACK. That way we're not adding a third (or fourth?) way of doing audio in Linux; we're just making the server better.

Keypress drops Sometimes I lose keyboard presses. This correlates with the following warning from Sway:
d c 06 10:36:31 curie sway[343384]: 23:32:14.034 [ERROR] [wlr] [libinput] event5  - SONiX USB Keyboard: client bug: event processing lagging behind by 37ms, your system is too slow 
... and corresponds to an open bug report in Sway. It seems the "system is too slow" should really be "your compositor is too slow" which seems to be the case here on this older system (curie). It doesn't happen often, but it does happen, particularly when a bunch of busy processes start in parallel (in my case: a linter running inside a container and notmuch new). The proposed fix for this in Sway is to gain real time privileges and add the CAP_SYS_NICE capability to the binary. We'll see how that goes in Debian once 1.8 gets released and shipped.

Improvements over i3

Tiling improvements There's a lot of improvements Sway could bring over using plain i3. There are pretty neat auto-tilers that could replicate the configurations I used to have in Xmonad or Awesome, see:

Display latency tweaks TODO: You can tweak the display latency in wlroots compositors with the max_render_time parameter, possibly getting lower latency than X11 in the end.

Sound/brightness changes notifications TODO: Avizo can display a pop-up to give feedback on volume and brightness changes. Not in Debian. Other alternatives include SwayOSD and sway-nc, also not in Debian.

Debugging tricks The xeyes (in the x11-apps package) will run in Wayland, and can actually be used to easily see if a given window is also in Wayland. If the "eyes" follow the cursor, the app is actually running in xwayland, so not natively in Wayland. Another way to see what is using Wayland in Sway is with the command:
swaymsg -t get_tree

Other documentation

Conclusion In general, this took me a long time, but it mostly works. The tray icon situation is pretty frustrating, but there's a workaround and I have high hopes it will eventually fix itself. I'm also actually worried about the DisplayLink support because I eventually want to be using this, but hopefully that's another thing that will hopefully fix itself before I need it.

A word on the security model I'm kind of worried about all the hacks that have been added to Wayland just to make things work. Pretty much everywhere we need to, we punched a hole in the security model: Wikipedia describes the security properties of Wayland as it "isolates the input and output of every window, achieving confidentiality, integrity and availability for both." I'm not sure those are actually realized in the actual implementation, because of all those holes punched in the design, at least in Sway. For example, apparently the GNOME compositor doesn't have the virtual-keyboard protocol, but they do have (another?!) text input protocol. Wayland does offer a better basis to implement such a system, however. It feels like the Linux applications security model lacks critical decision points in the UI, like the user approving "yes, this application can share my screen now". Applications themselves might have some of those prompts, but it's not mandatory, and that is worrisome.

1 November 2022

Jonathan Dowland: Halloween playlist 2022

I hope you had a nice Halloween! I've collected together some songs that I've enjoyed over the last couple of years that loosely fit a theme: ambient, instrumental, experimental, industrial, dark, disconcerting, etc. I've prepared a Spotify playlist of most of them, but not all. The list is inline below as well, with many (but not all) tracks linking to Bandcamp, if I could find them there. This is a bit late, sorry. If anyone listens to something here and has any feedback I'd love to hear it. (If you are reading this on an aggregation site, it's possible the embeds won't work. If so, click through to my main site) Spotify playlist: https://open.spotify.com/playlist/3bEvEguRnf9U1RFrNbv5fk?si=9084cbf78c364ac8; The list, with Bandcamp embeds where possible: Some sources
  1. Via Stuart Maconie's Freak Zone
  2. Via Mary Anne Hobbs
  3. Via Lose yourself with
  4. Soma FM - Doomed (Halloween Special)

14 October 2022

Shirish Agarwal: Dowry, Racism, Railways

Dowry Few days back, had posted about the movie Raksha Bandhan and whatever I felt about it. Sadly, just couple of days back, somebody shared this link. Part of me was shocked and part of me was not. Couple of acquaintances of mine in the past had said the same thing for their daughters. And in such situations you are generally left speechless because you don t know what the right thing to do is. If he has shared it with you being an outsider, how many times he must have told the same to their wife and daughters? And from what little I have gathered in life, many people have justified it on similar lines. And while the protests were there, sadly the book was not removed. Now if nurses are reading such literature, how their thought process might be forming, you can tell :(. And these are the ones whom we call for when we are sick and tired :(. And I have not taken into account how the girls/women themselves might be feeling. There are similar things in another country but probably not the same, nor the same motivations though although feeling helplessness in both would be a common thing. But such statements are not alone. Another gentleman in slightly different context shared this as well
The above is a statement shared in a book recommended for CTET (Central Teacher s Eligibility Test that became mandatory to be taken as the RTE (Right To Education) Act came in.). The statement says People from cold places are white, beautiful, well-built, healthy and wise. And people from hot places are black, irritable and of violent nature. Now while I can agree with one part of the statement that people residing in colder regions are more fair than others but there are loads of other factors that determine fairness or skin color/skin pigmentation. After a bit of search came to know that this and similar articulation have been made in an idea/work called Environmental Determinism . Now if you look at that page, you would realize this was what colonialism is and was all about. The idea that the white man had god-given right to rule over others. Similarly, if you are fair, you can lord over others. Seems simplistic, but yet it has a powerful hold on many people in India. Forget the common man, this thinking is and was applicable to some of our better-known Freedom fighters. Pune s own Bal Gangadhar Tilak The Artic Home to the Vedas. It sort of talks about Aryans and how they invaded India and became settled here. I haven t read or have access to the book so have to rely on third-party sources. The reason I m sharing all this is that the right-wing has been doing this myth-making for sometime now and unless and until you put a light on it, it will continue to perpetuate  . For those who have read this blog, do know that India is and has been in casteism from ever. They even took the fair comment and applied it to all Brahmins. According to them, all Brahmins are fair and hence have god-given right to lord over others. What is called the Eton boy s network serves the same in this casteism. The only solution is those idea under limelight and investigate. To take the above, how does one prove that all fair people are wise and peaceful while all people black and brown are violent. If that is so, how does one count for Mahatma Gandhi, Martin Luther King Junior, Nelson Mandela, Michael Jackson the list is probably endless. And not to forget that when Mahatma Gandhiji did his nonviolent movements either in India or in South Africa, both black and brown people in millions took part. Similar examples of Martin Luther King Jr. I know and read of so many non-violent civl movements that took place in the U.S. For e.g. Rosa Parks and the Montgomery Bus Boycott. So just based on these examples, one can conclude that at least the part about the fair having exclusive rights to being fair and noble is not correct. Now as far as violence goes, while every race, every community has had done violence in the past or been a victim of the same. So no one is and can be blameless, although in light of the above statement, the question can argumentated as to who were the Vikings? Both popular imagination and serious history shares stories about Vikings. The Vikings were somewhat nomadic in nature even though they had permanent settlements but even then they went on raids, raped women, captured both men and women and sold them at slaves. So they are what pirates came to be, but not the kind Hollywood romanticizes about. Europe in itself has been a tale in conflict since time immemorial. It is only after the formation of EU that most of these countries stopped fighting each other From a historical point perspective, it is too new. So even the part of fair being non-violent dies in face of this evidence. I could go on but this is enough on that topic.

Railways and Industrial Action around the World. While I have shared about Railways so many times on this blog, it continues to fascinate me that how people don t understand the first things about Railways. For e.g. Railways is a natural monopoly. What that means is and you can look at all and any type of privatization around the world, you will see it is a monopoly. Unlike the road or Skies, Railways is and would always be limited by infrastructure and the ability to have new infrastructure. Unlike in road or Skies (even they have their limits) you cannot run train services on a whim. At any particular point in time, only a single train could and should occupy a stretch of Railway network. You could have more trains on one line, but then the likelihood of front or rear-end collisions becomes a real possibility. You also need all sorts of good and reliable communications, redundant infrastructure so if one thing fails then you have something in place. The reason being a single train can carry anywhere from 2000 to 5000 passengers or more. While this is true of Indian Railways, Railways around the world would probably have some sort of similar numbers.It is in this light that I share the below videos.
To be more precise, see the fuller video
Now to give context to the recording above, Mike Lynch is the general secretary at RMT. For those who came in late, both UK and the U.S. have been threatened by railway strikes. And the reason for the strikes or threat of strikes is similar. Now from the company perspective, all they care is to invest less and make the most profits that can be given to equity shareholders. At the same time, they have freezed the salaries of railway workers for the last 3 years. While the politicians who were asking the questions, apparently gave themselves raise twice this year. They are asking them to negotiate at 8% while inflation in the UK has been 12.3% and projected to go higher. And it is not only the money. Since the 1980s when UK privatized the Railways, they stopped investing in the infrastructure. And that meant that the UK Railway infrastructure over period of time started getting behind and is even behind say Indian Railways which used to provide most bang for the buck. And Indian Railways is far from ideal. Ironically, most of the operators on UK are nationalized Railways of France, Germany etc. but after the hard Brexit, they too are mulling to cut their operations short, they have too  There is also the EU Entry/Exit system that would come next year. Why am I sharing about what is happening in UK Rail, because the Indian Government wants to follow the same thing, and fooling the public into saying we would do it better. What inevitably will happen is that ticket prices go up, people no longer use the service, the number of services go down and eventually they are cancelled. This has happened both in Indian Railways as well as Airlines. In fact, GOI just recently announced a credit scheme just a few days back to help Airlines stay afloat. I was chatting with a friend who had come down to Pune from Chennai and the round-trip cost him INR 15k/- on that single trip alone. We reminisced how a few years ago, 8 years to be precise, we could buy an Air ticket for 2.5k/- just a few days before the trip and did it. I remember doing/experiencing at least a dozen odd trips via air in the years before 2014. My friend used to come to Pune, almost every weekend because he could afford it, now he can t do that. And these are people who are in the above 5-10% of the population. And this is not just in UK, but also in the United States. There is one big difference though, the U.S. is mainly a freight carrier while the UK Railway Operations are mostly passenger based. What was and is interesting that Scotland had to nationalize their services as they realized the Operators cannot or will not function when they were most needed. Most of the public even in the UK seem to want a nationalized rail service, at least their polls say so. So, it would definitely be interesting to see what happens in the UK next year. In the end, I know I promised to share about books, but the above incidents have just been too fascinating to not just share the news but also share what I think about them. Free markets function good where there is competition, for example what is and has been happening in China for EV s but not where you have natural monopolies. In all Railway privatization, you have to handover the area to one person, then they have no motivation. If you have multiple operators, then there would always be haggling as to who will run the train and at what time. In either scenario, it doesn t work and raises prices while not delivering anything better  I do take examples from UK because lot of things are India are still the legacy of the British. The whole civil department that was created in 1953 is/was a copy of the British civil department at that time and it is to this day. P.S. Just came to know that the UK Chancellor Kwasi Kwarteng was just sacked as UK Chancellor. I do commend Truss for facing the press even though she might be dumped a week later unlike our PM who hasn t faced a single press conference in the last 8 odd years.

https://www.youtube.com/watch?v=oTP6ogBqU7of The difference in Indian and UK politics seems to be that the English are now asking questions while here in India, most people are still sleeping without a care in the world. Another thing to note Minidebconf Palakkad is gonna happen 12-13th November 2022. I am probably not gonna go but would request everyone who wants to do something in free software to attend it. I am not sure whether I would be of any use like this and also when I get back, it would be an empty house. But for people young and old, who want to do anything with free/open source software it is a chance not to be missed. Registration of the same closes on 1st of November 2022. All the best, break a leg  Just read this, beautifully done.

10 October 2022

Ian Jackson: Skipping releases when upgrading Debian systems

Debian does not officially support upgrading from earlier than the previous stable release: you re not supposed to skip releases. Instead, you re supposed to upgrade to each intervening major release in turn. However, skipping intervening releases does, in fact, often work quite well. Apparently, this is surprising to many people, even Debian insiders. I was encouraged to write about it some more. My personal experience I have three conventionally-managed personal server systems (by which I mean systems which aren t reprovisioned by some kind of automation). Of these at least two have been skip upgraded at least once: The one I don t think I ve skip-upgraded (at least, not recently) is my house network manager (and now VM host) which I try to keep to a minimum in terms of functionality and which I keep quite up to date. It was crossgraded from i386 (32-bit) to amd64 (64-bit) fairly recently, which is a thing that Debian isn t sure it supports. The crossgrade was done a hurry and without any planning, prompted by Spectre et al suddenly requiring big changes to Xen. But it went well enough. My home does random stuff server (media server, web cache, printing, DNS, backups etc.), has etckeeper records starting in 2015. I upgraded directly from jessie (Debian 8) to buster (Debian 10). I think it has probably had earlier skip upgrade(s): the oldest file in /etc is from December 1996 and I have been doing occasional skip upgrades as long as I can remember. And of course there s chiark, which is one of the oldest Debian installs in existence. I wrote about the most recent upgrade, where I went directly from jessie i386 ELTS (32-bit Debian 8) to bulleye amd64 (64-bit Debian 11). That was a very extreme case which required significant planning and pre-testing, since the package dependencies were in no way sufficient for the proper ordering. But, I don t normally go to such lengths. Normally, even on chiark, I just edit the sources.list and see what apt proposes to do. I often skip upgrade chiark because I tend to defer risky-looking upgrades partly in the hope of others fixing the bugs while I wait :-), and partly just because change is disruptive and amortising it is very helpful both to me and my users. I have some records of chiark s upgrades from my announcements to users. As well as the recent skip skip up cross grade, direct , I definitely did a skip upgrade of chiark from squeeze (Debian 6) to jessie (Debian 8). It appears that the previous skip upgrade on chiark was rex (Debian 1.2) to hamm (Debian 2.0). I don t think it s usual for me to choose to do a multi-release upgrade the officially supported way, in two (or more) stages, on a server. I have done that on systems with a GUI desktop setup, but even then I usually skip the intermediate reboot(s). When to skip upgrade (and what precautions to take) I m certainly not saying that everyone ought to be doing this routinely. Most users with a Debian install that is older than oldstable probably ought to reinstall it, or do the two-stage upgrade. Skip upgrading almost always runs into some kind of trouble (albeit, usually trouble that isn t particularly hard to fix if you know what you re doing). However, officially supported non-skip upgrades go wrong too. Doing a two-or-more-releases upgrade via the intermediate releases can expose you to significant bugs in the intermediate releases, which were later fixed. Because Debian s users and downstreams are cautious, and Debian itself can be slow, it is common for bugs to appear for one release and then be fixed only in the next. Paradoxically, this seems to be especially true with the kind of big and scary changes where you d naively think the upgrade mechanisms would break if you skipped the release where the change first came in. I would not recommend a skip upgrade to someone who is not a competent Debian administrator, with good familiarity with Debian package management, including use of dpkg directly to fix things up. You should have a mental toolkit of manual bug workaround techniques. I always should make sure that I have rescue media (and in the case of a remote system, full remote access including ability to boot a different image), although I don t often need it. And, when considering a skip upgrade, you should be aware of the major changes that have occurred in Debian. Skip upgrading is more likely to be a good idea with a complex and highly customised system: a fairly vanilla install is not likely to encounter problems during a two-stage update. (And, a vanilla system can be upgraded by reinstalling.) I haven t recently skip upgraded a laptop or workstation. I doubt I would attempt it; modern desktop software seems to take a much harder line about breaking things that are officially unsupported, and generally trying to force everyone into the preferred mold. A request to Debian maintainers I would like to encourage Debian maintainers to defer removing upgrade compatibility machinery until it is actually getting in the way, or has become hazardous, or many years obsolete. Examples of the kinds of things which it would be nice to keep, and which do not usually cause much inconvenience to retain, are dependency declarations (particularly, alternatives), and (many) transitional fragments in maintainer scripts. If you find yourself needing to either delete some compatibility feature, or refactor/reorganise it, I think it is probably best to delete it. If you modify it significantly, the resulting thing (which won t be tested until someone uses it in anger) is quite likely to have bugs which make it go wrong more badly (or, more confusingly) than the breakage that would happen without it. Obviously this is all a judgement call. I m not saying Debian should formally support skip upgrades, to the extent of (further) slowing down important improvements. Nor am I asking for any change to the routine approach to (for example) transitional packages (i.e. the technique for ensuring continuity of installation when a package name changes). We try to make release upgrades work perfectly; but skip upgrades don t have to work perfectly to be valuable. Retaining compatibility code can also make it easier to provide official backports, and it probably helps downstreams with different release schedules. The fact that maintainers do in practice often defer removing compatibility code provides useful flexibility and options to at least some people. So it would be nice if you d at least not go out of your way to break it. Building on older releases I would also like to encourage maintainers to provide source packages in Debian unstable that will still build on older releases, where this isn t too hard and the resulting binaries might be basically functional. Speaking personally, it s not uncommon for me to rebuild packages from unstable and install them on much older releases. This is another thing that is not officially supported, but which often works well. I m not saying to contort your build system, or delay progress. You ll definitely want to depend on a recentish debhelper. But, for example, retaining old build-dependency alternatives is nice. Retaining them doesn t constitute a promise that it works - it just makes life slightly easier for someone who is going off piste. If you know you have users on multiple distros or multiple releases, and wish to fully support them, you can go further, of course. Many of my own packages are directly buildable, or even directly installable, on older releases.

comment count unavailable comments

29 September 2022

Antoine Beaupr : Detecting manual (and optimizing large) package installs in Puppet

Well this is a mouthful. I recently worked on a neat hack called puppet-package-check. It is designed to warn about manually installed packages, to make sure "everything is in Puppet". But it turns out it can (probably?) dramatically decrease the bootstrap time of Puppet bootstrap when it needs to install a large number of packages.

Detecting manual packages On a cleanly filed workstation, it looks like this:
root@emma:/home/anarcat/bin# ./puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
0 unmanaged packages found
A messy workstation will look like this:
root@curie:/home/anarcat/bin# ./puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
288 unmanaged packages found
apparmor-utils beignet-opencl-icd bridge-utils clustershell cups-pk-helper davfs2 dconf-cli dconf-editor dconf-gsettings-backend ddccontrol ddrescueview debmake debootstrap decopy dict-devil dict-freedict-eng-fra dict-freedict-eng-spa dict-freedict-fra-eng dict-freedict-spa-eng diffoscope dnsdiag dropbear-initramfs ebtables efibootmgr elpa-lua-mode entr eog evince figlet file file-roller fio flac flex font-manager fonts-cantarell fonts-inconsolata fonts-ipafont-gothic fonts-ipafont-mincho fonts-liberation fonts-monoid fonts-monoid-tight fonts-noto fonts-powerline fonts-symbola freeipmi freetype2-demos ftp fwupd-amd64-signed gallery-dl gcc-arm-linux-gnueabihf gcolor3 gcp gdisk gdm3 gdu gedit gedit-plugins gettext-base git-debrebase gnome-boxes gnote gnupg2 golang-any golang-docker-credential-helpers golang-golang-x-tools grub-efi-amd64-signed gsettings-desktop-schemas gsfonts gstreamer1.0-libav gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-pulseaudio gtypist gvfs-backends hackrf hashcat html2text httpie httping hugo humanfriendly iamerican-huge ibus ibus-gtk3 ibus-libpinyin ibus-pinyin im-config imediff img2pdf imv initramfs-tools input-utils installation-birthday internetarchive ipmitool iptables iptraf-ng jackd2 jupyter jupyter-nbextension-jupyter-js-widgets jupyter-qtconsole k3b kbtin kdialog keditbookmarks keepassxc kexec-tools keyboard-configuration kfind konsole krb5-locales kwin-x11 leiningen lightdm lintian linux-image-amd64 linux-perf lmodern lsb-base lvm2 lynx lz4json magic-wormhole mailscripts mailutils manuskript mat2 mate-notification-daemon mate-themes mime-support mktorrent mp3splt mpdris2 msitools mtp-tools mtree-netbsd mupdf nautilus nautilus-sendto ncal nd ndisc6 neomutt net-tools nethogs nghttp2-client nocache npm2deb ntfs-3g ntpdate nvme-cli nwipe obs-studio okular-extra-backends openstack-clients openstack-pkg-tools paprefs pass-extension-audit pcmanfm pdf-presenter-console pdf2svg percol pipenv playerctl plymouth plymouth-themes popularity-contest progress prometheus-node-exporter psensor pubpaste pulseaudio python3-ldap qjackctl qpdfview qrencode r-cran-ggplot2 r-cran-reshape2 rake restic rhash rpl rpm2cpio rs ruby ruby-dev ruby-feedparser ruby-magic ruby-mocha ruby-ronn rygel-playbin rygel-tracker s-tui sanoid saytime scrcpy scrcpy-server screenfetch scrot sdate sddm seahorse shim-signed sigil smartmontools smem smplayer sng sound-juicer sound-theme-freedesktop spectre-meltdown-checker sq ssh-audit sshuttle stress-ng strongswan strongswan-swanctl syncthing system-config-printer system-config-printer-common system-config-printer-udev systemd-bootchart systemd-container tardiff task-desktop task-english task-ssh-server tasksel tellico texinfo texlive-fonts-extra texlive-lang-cyrillic texlive-lang-french texlive-lang-german texlive-lang-italian texlive-xetex tftp-hpa thunar-archive-plugin tidy tikzit tint2 tintin++ tipa tpm2-tools traceroute tree trocla ucf udisks2 unifont unrar-free upower usbguard uuid-runtime vagrant-cachier vagrant-libvirt virt-manager vmtouch vorbis-tools w3m wamerican wamerican-huge wfrench whipper whohas wireshark xapian-tools xclip xdg-user-dirs-gtk xlax xmlto xsensors xserver-xorg xsltproc xxd xz-utils yubioath-desktop zathura zathura-pdf-poppler zenity zfs-dkms zfs-initramfs zfsutils-linux zip zlib1g zlib1g-dev
157 old: apparmor-utils clustershell davfs2 dconf-cli dconf-editor ddccontrol ddrescueview decopy dnsdiag ebtables efibootmgr elpa-lua-mode entr figlet file-roller fio flac flex font-manager freetype2-demos ftp gallery-dl gcc-arm-linux-gnueabihf gcolor3 gcp gdu gedit git-debrebase gnote golang-docker-credential-helpers golang-golang-x-tools gtypist hackrf hashcat html2text httpie httping hugo humanfriendly iamerican-huge ibus ibus-pinyin imediff input-utils internetarchive ipmitool iptraf-ng jackd2 jupyter-qtconsole k3b kbtin kdialog keditbookmarks keepassxc kexec-tools kfind konsole leiningen lightdm lynx lz4json magic-wormhole manuskript mat2 mate-notification-daemon mktorrent mp3splt msitools mtp-tools mtree-netbsd nautilus nautilus-sendto nd ndisc6 neomutt net-tools nethogs nghttp2-client nocache ntpdate nwipe obs-studio openstack-pkg-tools paprefs pass-extension-audit pcmanfm pdf-presenter-console pdf2svg percol pipenv playerctl qjackctl qpdfview qrencode r-cran-ggplot2 r-cran-reshape2 rake restic rhash rpl rpm2cpio rs ruby-feedparser ruby-magic ruby-mocha ruby-ronn s-tui saytime scrcpy screenfetch scrot sdate seahorse shim-signed sigil smem smplayer sng sound-juicer spectre-meltdown-checker sq ssh-audit sshuttle stress-ng system-config-printer system-config-printer-common tardiff tasksel tellico texlive-lang-cyrillic texlive-lang-french tftp-hpa tikzit tint2 tintin++ tpm2-tools traceroute tree unrar-free vagrant-cachier vagrant-libvirt vmtouch vorbis-tools w3m wamerican wamerican-huge wfrench whipper whohas xdg-user-dirs-gtk xlax xmlto xsensors xxd yubioath-desktop zenity zip
131 new: beignet-opencl-icd bridge-utils cups-pk-helper dconf-gsettings-backend debmake debootstrap dict-devil dict-freedict-eng-fra dict-freedict-eng-spa dict-freedict-fra-eng dict-freedict-spa-eng diffoscope dropbear-initramfs eog evince file fonts-cantarell fonts-inconsolata fonts-ipafont-gothic fonts-ipafont-mincho fonts-liberation fonts-monoid fonts-monoid-tight fonts-noto fonts-powerline fonts-symbola freeipmi fwupd-amd64-signed gdisk gdm3 gedit-plugins gettext-base gnome-boxes gnupg2 golang-any grub-efi-amd64-signed gsettings-desktop-schemas gsfonts gstreamer1.0-libav gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-pulseaudio gvfs-backends ibus-gtk3 ibus-libpinyin im-config img2pdf imv initramfs-tools installation-birthday iptables jupyter jupyter-nbextension-jupyter-js-widgets keyboard-configuration krb5-locales kwin-x11 lintian linux-image-amd64 linux-perf lmodern lsb-base lvm2 mailscripts mailutils mate-themes mime-support mpdris2 mupdf ncal npm2deb ntfs-3g nvme-cli okular-extra-backends openstack-clients plymouth plymouth-themes popularity-contest progress prometheus-node-exporter psensor pubpaste pulseaudio python3-ldap ruby ruby-dev rygel-playbin rygel-tracker sanoid scrcpy-server sddm smartmontools sound-theme-freedesktop strongswan strongswan-swanctl syncthing system-config-printer-udev systemd-bootchart systemd-container task-desktop task-english task-ssh-server texinfo texlive-fonts-extra texlive-lang-german texlive-lang-italian texlive-xetex thunar-archive-plugin tidy tipa trocla ucf udisks2 unifont upower usbguard uuid-runtime virt-manager wireshark xapian-tools xclip xserver-xorg xsltproc xz-utils zathura zathura-pdf-poppler zfs-dkms zfs-initramfs zfsutils-linux zlib1g zlib1g-dev
Yuck! That's a lot of shit to go through. Notice how the packages get sorted between "old" and "new" packages. This is because popcon is used as a tool to mark which packages are "old". If you have unmanaged packages, the "old" ones are likely things that you can uninstall, for example. If you don't have popcon installed, you'll also get this warning:
popcon stats not available: [Errno 2] No such file or directory: '/var/log/popularity-contest'
The error can otherwise be safely ignored, but you won't get "help" prioritizing the packages to add to your manifests. Note that the tool ignores packages that were "marked" (see apt-mark(8)) as automatically installed. This implies that you might have to do a little bit of cleanup the first time you run this, as Debian doesn't necessarily mark all of those packages correctly on first install. For example, here's how it looks like on a clean install, after Puppet ran:
root@angela:/home/anarcat# ./bin/puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
127 unmanaged packages found
ca-certificates console-setup cryptsetup-initramfs dbus file gcc-12-base gettext-base grub-common grub-efi-amd64 i3lock initramfs-tools iw keyboard-configuration krb5-locales laptop-detect libacl1 libapparmor1 libapt-pkg6.0 libargon2-1 libattr1 libaudit-common libaudit1 libblkid1 libbpf0 libbsd0 libbz2-1.0 libc6 libcap-ng0 libcap2 libcap2-bin libcom-err2 libcrypt1 libcryptsetup12 libdb5.3 libdebconfclient0 libdevmapper1.02.1 libedit2 libelf1 libext2fs2 libfdisk1 libffi8 libgcc-s1 libgcrypt20 libgmp10 libgnutls30 libgpg-error0 libgssapi-krb5-2 libhogweed6 libidn2-0 libip4tc2 libiw30 libjansson4 libjson-c5 libk5crypto3 libkeyutils1 libkmod2 libkrb5-3 libkrb5support0 liblocale-gettext-perl liblockfile-bin liblz4-1 liblzma5 libmd0 libmnl0 libmount1 libncurses6 libncursesw6 libnettle8 libnewt0.52 libnftables1 libnftnl11 libnl-3-200 libnl-genl-3-200 libnl-route-3-200 libnss-systemd libp11-kit0 libpam-systemd libpam0g libpcre2-8-0 libpcre3 libpcsclite1 libpopt0 libprocps8 libreadline8 libselinux1 libsemanage-common libsemanage2 libsepol2 libslang2 libsmartcols1 libss2 libssl1.1 libssl3 libstdc++6 libsystemd-shared libsystemd0 libtasn1-6 libtext-charwidth-perl libtext-iconv-perl libtext-wrapi18n-perl libtinfo6 libtirpc-common libtirpc3 libudev1 libunistring2 libuuid1 libxtables12 libxxhash0 libzstd1 linux-image-amd64 logsave lsb-base lvm2 media-types mlocate ncurses-term pass-extension-otp puppet python3-reportbug shim-signed tasksel ucf usr-is-merged util-linux-extra wpasupplicant xorg zlib1g
popcon stats not available: [Errno 2] No such file or directory: '/var/log/popularity-contest'
Normally, there should be unmanaged packages here. But because of the way Debian is installed, a lot of libraries and some core packages are marked as manually installed, and are of course not managed through Puppet. There are two solutions to this problem:
  • really manage everything in Puppet (argh)
  • mark packages as automatically installed
I typically chose the second path and mark a ton of stuff as automatic. Then either they will be auto-removed, or will stop being listed. In the above scenario, one could mark all libraries as automatically installed with:
apt-mark auto $(./bin/puppet-package-check   grep -o 'lib[^ ]*')
... but if you trust that most of that stuff is actually garbage that you don't really want installed anyways, you could just mark it all as automatically installed:
apt-mark auto $(./bin/puppet-package-check)
In my case, that ended up keeping basically all libraries (because of course they're installed for some reason) and auto-removing this:
dh-dkms discover-data dkms libdiscover2 libjsoncpp25 libssl1.1 linux-headers-amd64 mlocate pass-extension-otp pass-otp plocate x11-apps x11-session-utils xinit xorg
You'll notice xorg in there: yep, that's bad. Not what I wanted. But for some reason, on other workstations, I did not actually have xorg installed. Turns out having xserver-xorg is enough, and that one has dependencies. So now I guess I just learned to stop worrying and live without X(org).

Optimizing large package installs But that, of course, is not all. Why make things simple when you can have an unreadable title that is trying to be both syntactically correct and click-baity enough to flatter my vain ego? Right. One of the challenges in bootstrapping Puppet with large package lists is that it's slow. Puppet lists packages as individual resources and will basically run apt install $PKG on every package in the manifest, one at a time. While the overhead of apt is generally small, when you add things like apt-listbugs, apt-listchanges, needrestart, triggers and so on, it can take forever setting up a new host. So for initial installs, it can actually makes sense to skip the queue and just install everything in one big batch. And because the above tool inspects the packages installed by Puppet, you can run it against a catalog and have a full lists of all the packages Puppet would install, even before I even had Puppet running. So when reinstalling my laptop, I basically did this:
apt install puppet-agent/experimental
puppet agent --test --noop
apt install $(./puppet-package-check --debug \
    2>&1   grep ^puppet\ packages 
      sed 's/puppet packages://;s/ /\n/g'
      grep -v -e onionshare -e golint -e git-sizer -e github-backup -e hledger -e xsane -e audacity -e chirp -e elpa-flycheck -e elpa-lsp-ui -e yubikey-manager -e git-annex -e hopenpgp-tools -e puppet
) puppet-agent/experimental
That massive grep was because there are currently a lot of packages missing from bookworm. Those are all packages that I have in my catalog but that still haven't made it to bookworm. Sad, I know. I eventually worked around that by adding bullseye sources so that the Puppet manifest actually ran. The point here is that this improves the Puppet run time a lot. All packages get installed at once, and you get a nice progress bar. Then you actually run Puppet to deploy configurations and all the other goodies:
puppet agent --test
I wish I could tell you how much faster that ran. I don't know, and I will not go through a full reinstall just to please your curiosity. The only hard number I have is that it installed 444 packages (which exploded in 10,191 packages with dependencies) in a mere 10 minutes. That might also be with the packages already downloaded. In any case, I have that gut feeling it's faster, so you'll have to just trust my gut. It is, after all, much more important than you might think.

Similar work The blueprint system is something similar to this:
It figures out what you ve done manually, stores it locally in a Git repository, generates code that s able to recreate your efforts, and helps you deploy those changes to production
That tool has unfortunately been abandoned for a decade at this point. Also note that the AutoRemove::RecommendsImportant and AutoRemove::SuggestsImportant are relevant here. If it is set to true (the default), a package will not be removed if it is (respectively) a Recommends or Suggests of another package (as opposed to the normal Depends). In other words, if you want to also auto-remove packages that are only Suggests, you would, for example, add this to apt.conf:
AutoRemove::SuggestsImportant false;
Paul Wise has tried to make the Debian installer and debootstrap properly mark packages as automatically installed in the past, but his bug reports were rejected. The other suggestions in this section are also from Paul, thanks!

28 September 2022

Ian Jackson: Hippotat (IP over HTTP) - first advertised release

I have released version 1.0.0 of Hippotat, my IP-over-HTTP system. To quote the README:
You re in a cafe or a hotel, trying to use the provided wifi. But it s not working. You discover that port 80 and port 443 are open, but the wifi forbids all other traffic.
Never mind, start up your hippotat client. Now you have connectivity. Your VPN and SSH and so on run over Hippotat. The result is not very efficient, but it does work.
Story In early 2017 I was in a mountaintop cafeteria, hoping to do some work on my laptop. (For Reasons I couldn t go skiing that day.) I found that local wifi was badly broken: It had a severe port block. I had to use my port 443 SSH server to get anywhere. My usual arrangements punt everything over my VPN, which uses UDP of course, and I had to bodge several things. Using a web browser directly only the wifi worked normally, of course - otherwise the other guests would have complained. This was not the first experience like this I d had, but this time I had nothing much else to do but fix it. In a few furious hacking sessions, I wrote Hippotat, a tool for making my traffic look enough like ordinary web browsing that it gets through most stupid firewalls. That Python version of Hippotat served me well for many years, despite being rather shonky, extremely inefficient in CPU (and therefore battery) terms and not very productised. But recently things have started to go wrong. I was using Twisted Python and there was what I think must be some kind of buffer handling bug, which started happening when I upgraded the OS (getting newer versions of Python and the Twisted libraries). The Hippotat code, and the Twisted APIs, were quite convoluted, and I didn t fancy debugging it. So last year I rewrote it in Rust. The new Rust client did very well against my existing servers. To my shame, I didn t get around to releasing it. However, more recently I upgraded the server hosts my Hippotat daemons run on to recent Debian releases. They started to be affected by the bug too, rendering my Rust client unuseable. I decided I had to deploy the Rust server code. This involved some packaging work. Having done that, it s time to release it: Hippotat 1.0.0 is out. The package build instructions are rather strange My usual approach to releasing something like this would be to provide a git repository containing a proper Debian source package. I might also build binaries, using sbuild, and I would consider actually uploading to Debian. However, despite me taking a fairly conservative approach to adding dependencies to Hippotat, still a couple of the (not very unusual) Rust packages that Hippotat depends on are not in Debian. Last year I considered tackling this head-on, but I got derailed by difficulties with Rust packaging in Debian. Furthermore, the version of the Rust compiler itself in Debian stable is incapable of dealing with recent versions of very many upstream Rust packages, because many packages most recent versions now require the 2021 Edition of Rust. Sadly, Rust s package manager, cargo, has no mechanism for trying to choose dependency versions that are actually compatible with the available compiler; efforts to solve this problem have still not borne the needed fruit. The result is that, in practice, currently Hippotat has to be built with (a) a reasonably recent Rust toolchain such as found in Debian unstable or obtained from Rust upstream; (b) dependencies obtained from the upstream Rust repository. At least things aren t completely terrible: Rustup itself, despite its alarming install rune, has a pretty good story around integrity, release key management and so on. And with the right build rune, cargo will check not just the versions, but the precise content hashes, of the dependencies to be obtained from crates.io, against the information I provide in the Cargo.lock file. So at least when you build it you can be sure that the dependencies you re getting are the same ones I used myself when I built and tested Hippotat. And there s only 147 of them (counting indirect dependencies too), so what could possibly go wrong? Sadly the resulting package build system cannot work with Debian s best tool for doing clean and controlled builds, sbuild. Under the circumstances, I don t feel I want to publish any binaries.

comment count unavailable comments

27 September 2022

Steve McIntyre: Firmware again - updates, how I'm voting and why!

Updates Back in April I wrote about issues with how we handle firmware in Debian, and I also spoke about it at DebConf in July. Since then, we've started the General Resolution process - this led to a lot of discussion on the the debian-vote mailing list and we're now into the second week of the voting phase. The discussion has caught the interest of a few news sites along the way: My vote I've also had several people ask me how I'm voting myself, as I started this GR in the first place. I'm happy to oblige! Here's my vote, sorted into preference order:
  [1] Choice 5: Change SC for non-free firmware in installer, one installer
  [2] Choice 1: Only one installer, including non-free firmware
  [3] Choice 6: Change SC for non-free firmware in installer, keep both installers
  [4] Choice 2: Recommend installer containing non-free firmware
  [5] Choice 3: Allow presenting non-free installers alongside the free one
  [6] Choice 7: None Of The Above
  [7] Choice 4: Installer with non-free software is not part of Debian
Why have I voted this way? Fundamentally, my motivation for starting this vote was to ask the project for clear positive direction on a sensible way forward with non-free firmware support. Thus, I've voted all of the options that do that above NOTA. On those terms, I don't like Choice 4 here - IMHO it leaves us in the same unclear situation as before. I'd be happy for us to update the Social Contract for clarity, and I know some people would be much more comfortable if we do that explicitly here. Choice 1 was my initial personal preference as we started the GR, but since then I've been convinced that also updating the SC would be a good idea, hence Choice 5. I'd also rather have a single image / set of images produced, for the two reasons I've outlined before. It's less work for our images team to build and test all the options. But, much more importantly: I believe it's less likely to confuse new users. I appreciate that not everybody agrees with me here, and this is part of the reason why we're voting! Other Debian people have also blogged about their voting choices (Gunnar Wolf and Ian Jackson so far), and I thank them for sharing their reasoning too. For the avoidance of doubt: my goal for this vote was simply to get a clear direction on how to proceed here. Although I proposed Choice 1 (Only one installer, including non-free firmware), I also seconded several of the other ballot options. Of course I will accept the will of the project when the result is announced - I'm not going to do anything silly like throw a tantrum or quit the project over this! Finally If you're a DD and you haven't voted already, please do so - this is an important choice for the Debian project.

24 September 2022

Ian Jackson: Please vote in favour of the Debian Social Contract change

tl;dr: Please vote in favour of the Debian Social Contract change, by ranking all of its options above None of the Above. Rank the SC change options above corresponding options that do not change the Social Contract. Vote to change the SC even if you think the change is not necessary for Debian to prominently/officially provide an installer with-nonfree-firmware. Why vote for SC change even if I think it s not needed? I m addressing myself primarily to the reader who agrees with me that Debian ought to be officially providing with-firmware images. I think it is very likely that the winning option will be one of the ones which asks for an official and prominent with-firmware installer. However, many who oppose this change believe that it would be a breach of Debian s Social Contract. This is a very reasonable and arguable point of view. Indeed, I m inclined to share it. If the winning option is to provide a with-firmware installer (perhaps, only a with-firmware installer) those people will feel aggrieved. They will, quite reasonably, claim that the result of the vote is illegitimate - being contrary to Debian s principles as set out in the Social Contract, which require a 3:1 majority to change. There is even the possibility that the Secretary may declare the GR result void, as contrary to the Constitution! (Sadly, I am not making this up.) This would cast Debian into (yet another) acrimonious constitutional and governance crisis. The simplest answer is to amend the Social Contract to explicitly permit what is being proposed. Holger s option F and Russ s option E do precisely that. Amending the SC is not an admission that it was legally necessary to do so. It is practical politics: it ensures that we have clear authority and legitimacy. Aren t we softening Debian s principles? I think prominently distributing an installer that can work out of the box on the vast majority of modern computers would help Debian advance our users freedom. I see user freedom as a matter of practical capability, not theoretical purity. Anyone living in the modern world must make compromises. It is Debian s job to help our users (and downstreams) minimise those compromises and retain as much control as possible over the computers in their life. Insisting that a user buys different hardware, or forcing them to a different distro, does not serve that goal. I don t really expect to convince anyone with such a short argument, but I do want to make the point that providing an installer that users can use to obtain a lot of practical freedom is also, for many of us, a matter of principle.

comment count unavailable comments

22 September 2022

Jonathan Dowland: Nine Inch Nails, Cornwall, June

In June I travelled to see Nine Inch Nails perform two nights at the Eden Project in Cornwall. It'd been eight years since I last saw them live and when they announced the Eden shows, I thought it might be the only chance I'd get to see them for a long time. I committed, and sods law, a week or so later they announced a handful of single-night UK club shows. On the other hand, on previous tours where they'd typically book two club nights in each city, I've attended one night and always felt I should have done both, so this time I was making that happen. Newquay
approach by air approach by air
Towan Beach (I think) Towan Beach (I think)
For personal reasons it's been a difficult year so it was nice to treat myself to a mini holiday. I stayed in Newquay, a seaside town with many similarities to the North East coast, as well as many differences. It's much bigger, and although we have a thriving surfing community in Tynemouth, Newquay have it on another level. They also have a lot more tourism, which is a double-edged sword: in Newquay, besides surfing, there was not a lot to do. There's a lot of tourist tat shops, and bars and cafes (som very nice ones), but no book shops, no record shops, very few of the quaint, unique boutique places we enjoy up here and possibly take for granted. If you want tie-dyed t-shirts though, you're sorted. Nine Inch Nails have a long-established, independently fan-run forum called Echoing The Sound. There is now also an official Discord server. I asked on both whether anyone was around in Newquay and wanted to meet up: not many people were! But I did meet a new friend, James, for a quiet drink. He was due to share a taxi with Sarah, who was flying in but her flight was delayed and she had to figure out another route. Eden Project
the Eden Project the Eden Project
The Eden Project, the venue itself, is a fascinating place. I didn't realise until I'd planned most of my time there that the gig tickets granted you free entry into the Project on the day of the gig as well as the day after. It was quite tricky to get from Newquay to the Eden project, I would have been better off staying in St Austell itself perhaps, so I didn't take advantage of this, but I did have a couple of hours total to explore a little bit at the venue before the gig on each night. Friday 17th (sunny) Once I got to the venue I managed to meet up with several names from ETS and the Discord: James, Sarah (who managed to re-arrange flights), Pete and his wife (sorry I missed your name), Via Tenebrosa (she of crab hat fame), Dave (DaveDiablo), Elliot and his sister and finally James (sheapdean), someone who I've been talking to online for over a decade and finally met in person (and who taped both shows). I also tried to meet up with a friend from the Debian UK community (hi Lief) but I couldn't find him! Support for Friday was Nitzer Ebb, who I wasn't familiar with before. There were two men on stage, one operating instruments, the other singing. It was a tough time to warm up the crowd, the venue was still very empty and it was very bright and sunny, but I enjoyed what I was hearing. They're definitely on my list. I later learned that the band's regular singer (Doug McCarthy) was unable to make it, and so the guy I was watching (Bon Harris) was standing in for full vocal duties. This made the performance (and their subsequent one at Hellfest the week after) all the more impressive.
pic of the band
Via (with crab hat), Sarah, me (behind). pic by kraw Via (with crab hat), Sarah, me (behind). pic by kraw
(Day) and night one, Thursday, was very hot and sunny and the band seemed a little uncomfortable exposed on stage with little cover. Trent commented as such at least once. The setlist was eclectic: and I finally heard some of my white whale songs. Highlights for me were The Perfect Drug, which was unplayed from 1997-2018 and has now become a staple, and the second ever performance of Everything, the first being a few days earlier. Also notable was three cuts in a row from the last LP, Bad Witch, Heresy and Love Is Not Enough. Saturday 18th (rain)
with Elliot, before with Elliot, before
Day/night 2, Friday, was rainy all day. Support was Yves Tumor, who were an interesting clash of styles: a Prince/Bowie-esque inspired lead clashing with a rock-out lead guitarist styling himself similarly to Brian May. I managed to find Sarah, Elliot (new gig best-buddy), Via and James (sheapdean) again. Pete was at this gig too, but opted to take a more relaxed position than the rail this time. I also spent a lot of time talking to a Canadian guy on a press pass (both nights) that I'm ashamed to have forgotten his name. The dank weather had Nine Inch Nails in their element. I think night one had the more interesting setlist, but night two had the best performance, hands down. Highlights for me were mostly a string of heavier songs (in rough order of scarcity, from common to rarely played): wish, burn, letting you, reptile, every day is exactly the same, the line begins to blur, and finally, happiness in slavery, the first UK performance since 1994. This was a crushing set. A girl in front of me was really suffering with the cold and rain after waiting at the venue all day to get a position on the rail. I thought she was going to pass out. A roadie with NIN noticed, and came over and gave her his jacket. He said if she waited to the end of the show and returned his jacket he'd give her a setlist, and true to his word, he did. This was a really nice thing to happen and really gave the impression that the folks who work on these shows are caring people.
Yep I was this close Yep I was this close
A fuckin' rainbow! Photo by "Lazereth of Nazereth"
Afterwards Afterwards
Night two did have some gentler songs and moments to remember: a re-arranged Sanctified (which ended a nineteen-year hiatus in 2013) And All That Could Have Been (recorded 2002, first played 2018), La Mer, during which the rain broke and we were presented with a beautiful pink-hued rainbow. They then segued into Less Than, providing the comic moment of the night when Trent noticed the rainbow mid-song; now a meme that will go down in NIN fan history. Wrap-up This was a blow-out, once in a lifetime trip to go and see a band who are at the top of their career in terms of performance. One problem I've had with NIN gigs in the past is suffering gig flashback to them when I go to other (inferior) gigs afterwards, and I'm pretty sure I will have this problem again. Doing both nights was worth it, the two experiences were very different and each had its own unique moments. The venue was incredible, and Cornwall is (modulo tourist trap stuff) beautiful.

23 August 2022

Ian Jackson: prefork-interp - automatic startup time amortisation for all manner of scripts

The problem I had - Mason, so, sadly, FastCGI Since the update to current Debian stable, the website for YARRG, (a play-aid for Puzzle Pirates which I wrote some years ago), started to occasionally return Internal Server Error , apparently due to bug(s) in some FastCGI libraries. I was using FastCGI because the website is written in Mason, a Perl web framework, and I found that Mason CGI calls were slow. I m using CGI - yes, trad CGI - via userv-cgi. Running Mason this way would compile the template for each HTTP request just when it was rendered, and then throw the compiled version away. The more modern approach of an application server doesn t scale well to a system which has many web applications most of which are very small. The admin overhead of maintaining a daemon, and corresponding webserver config, for each such service would be prohibitive, even with some kind of autoprovisioning setup. FastCGI has an interpreter wrapper which seemed like it ought to solve this problem, but it s quite inconvenient, and often flaky. I decided I could do better, and set out to eliminate FastCGI from my setup. The result seems to be a success; once I d done all the hard work of writing prefork-interp, I found the result very straightforward to deploy. prefork-interp prefork-interp is a small C program which wraps a script, plus a scripting language library to cooperate with the wrapper program. Together they achieve the following: Features: Important properties not always satisfied by competing approaches: Swans paddling furiously The implementation is much more complicated than the (apparent) interface. I won t go into all the details here (there are some terrifying diagrams in the source code if you really want), but some highlights: We use an AF_UNIX socket (hopefully in /run/user/UID, but in ~ if not) for rendezvous. We can try to connect without locking, but we must protect the socket with a separate lockfile to avoid two concurrent restart attempts. We want stderr from the script setup (pre-initialisation) to be delivered to the caller, so the script ought to inherit our stderr and then will need to replace it later. Twice, in fact, because the daemonic server process can t have a stderr. When a script is restarted for any reason, any old socket will be removed. We want the old server process to detect that and quit. (If hung about, it would wait for the idle timeout; if this happened a lot - eg, a constantly changing set of services - we might end up running out of pids or something.) Spotting the socket disappearing, without polling, involves use of a library capable of using inotify (or the equivalent elsewhere). Choosing a C library to do this is not so hard, but portable interfaces to this functionality can be hard to find in scripting languages, and also we don t want every language binding to have to reimplement these checks. So for this purpose there s a little watcher process, and associated IPC. When an invoking instance of prefork-interp is killed, we must arrange for the executing service instance to stop reading from its stdin (and, ideally, writing its stdout). Otherwise it s stealing input from prefork-interp s successors (maybe the user s shell)! Cleanup ought not to depend on positive actions by failing processes, so each element of the system has to detect failures of its peers by means such as EOF on sockets/pipes. Obtaining prefork-interp I put this new tool in my chiark-utils package, which is a collection of useful miscellany. It s available from git. Currently I make releases by uploading to Debian, where prefork-interp has just hit Debian unstable, in chiark-utils 7.0.0. Support for other scripting languages I would love Python to be supported. If any pythonistas reading this think you might like to help out, please get in touch. The specification for the protocol, and what the script library needs to do, is documented in the source code Future plans for chiark-utils chiark-utils as a whole is in need of some tidying up of its build system and packaging. I intend to try to do some reorganisation. Currently I think it would be better to organising the source tree more strictly with a directory for each included facility, rather than grouping compiled and scripts together. The Debian binary packages should be reorganised more fully according to their dependencies, so that installing a program will ensure that it works. I should probably move the official git repo from my own git+gitweb to a forge (so we can have MRs and issues and so on). And there should be a lot more testing, including Debian autopkgtests.
edited 2022-08-23 10:30 +01:00 to improve the formatting


comment count unavailable comments

8 August 2022

Ian Jackson: dkim-rotate - rotation and revocation of DKIM signing keys

Background Internet email is becoming more reliant on DKIM, a scheme for having mail servers cryptographically sign emails. The Big Email providers have started silently spambinning messages that lack either DKIM signatures, or SPF. DKIM is arguably less broken than SPF, so I wanted to deploy it. But it has a problem: if done in a naive way, it makes all your emails non-repudiable, forever. This is not really a desirable property - at least, not desirable for you, although it can be nice for someone who (for example) gets hold of leaked messages obtained by hacking mailboxes. This problem was described at some length in Matthew Green s article Ok Google: please publish your DKIM secret keys. Following links from that article does get you to a short script to achieve key rotation but it had a number of problems, and wasn t useable in my context. dkim-rotate So I have written my own software for rotating and revoking DKIM keys: dkim-rotate. I think it is a good solution to this problem, and it ought to be deployable in many contexts (and readily adaptable to those it doesn t already support). Here s the feature list taken from the README: Complications It seems like it should be a simple problem. Keep N keys, and every day (or whatever), generate and start using a new key, and deliberately leak the oldest private key. But, things are more complicated than that. Considerably more complicated, as it turns out. I didn t want the DKIM key rotation software to have to edit the actual DNS zones for each relevant mail domain. That would tightly entangle the mail server administration with the DNS administration, and there are many contexts (including many of mine) where these roles are separated. The solution is to use DNS aliases (CNAME). But, now we need a fixed, relatively small, set of CNAME records for each mail domain. That means a fixed, relatively small set of key identifiers ( selectors in DKIM terminology), which must be used in rotation. We don t want the private keys to be published via the DNS because that makes an ever-growing DNS zone, which isn t great for performance; and, because we want to place barriers in the way of processes which might enumerate the set of keys we use (and the set of keys we have leaked) and keep records of what status each key had when. So we need a separate publication channel - for which a webserver was the obvious answer. We want the private keys to be readily noticeable and findable by someone who is verifying an alleged leaked email dump, but to be hard to enumerate. (One part of the strategy for this is to leave a note about it, with the prospective private key url, in the email headers.) The key rotation operations are more complicated than first appears, too. The short summary, above, neglects to consider the fact that DNS updates have a nonzero propagation time: if you change the DNS, not everyone on the Internet will experience the change immediately. So as well as a timeout for how long it might take an email to be delivered (ie, how long the DKIM signature remains valid), there is also a timeout for how long to wait after updating the DNS, before relying on everyone having got the memo. (This same timeout applies both before starting to sign emails with a new key, and before deliberately compromising a key which has been withdrawn and deadvertised.) Updating the DNS, and the MTA configuration, are fallible operations. So we need to cope with out-of-course situations, where a previous DNS or MTA update failed. In that case, we need to retry the failed update, and not proceed with key rotation. We mustn t start the timer for the key rotation until the update has been implemented. The rotation script will usually be run by cron, but it might be run by hand, and when it is run by hand it ought not to jump the gun and do anything too early (ie, before the relevant timeout has expired). cron jobs don t always run, and don t always run at precisely the right time. (And there s daylight saving time, to consider, too.) So overall, it s not sufficient to drive the system via cron and have it proceed by one unit of rotation on each run. And, hardest of all, I wanted to support post-deployment configuration changes, while continuing to keep the whole the system operational. Otherwise, you have to bake in all the timing parameters right at the beginning and can t change anything ever. So for example, I wanted to be able to change the email and DNS propagation delays, and even the number of selectors to use, without adversely affecting the delivery of already-sent emails, and without having to shut anything down. I think I have solved these problems. The resulting system is one which keeps track of the timing constraints, and the next event which might occur, on a per-key basis. It calculates on each run, which key(s) can be advanced to the next stage of their lifecycle, and performs the appropriate operations. The regular key update schedule is then an emergent property of the config parameters and cron job schedule. (I provide some example config.) Exim Integrating dkim-rotate itself with Exim was fairly easy. The lsearch lookup function can be used to fish information out of a suitable data file maintained by dkim-rotate. But a final awkwardness was getting Exim to make the right DKIM signatures, at the right time. When making a DKIM signature, one must choose a signing authority domain name: who should we claim to be? (This is the SDID in DKIM terms.) A mailserver that handles many different mail domains will be able to make good signatures on behalf of many of them. It seems to me that domain to be the mail domain in the From: header of the email. (The RFC doesn t seem to be clear on what is expected.) Exim doesn t seem to have anything builtin to do that. And, you only want to DKIM-sign emails that are originated locally or from trustworthy sources. You don t want to DKIM-sign messages that you received from the global Internet, and are sending out again (eg because of an email alias or mailing list). In theory if you verify DKIM on all incoming emails, you could avoid being fooled into signing bad emails, but rejecting all non-DKIM-verified email would be a very strong policy decision. Again, Exim doesn t seem to have cooked machinery. The resulting Exim configuration parameters run to 22 lines, and because they re parameters to an existing config item (the smtp transport) they can t even easily be deployed as a drop-in file via Debian s split config Exim configuration scheme. (I don t know if the file written for Exim s use by dkim-rotate would be suitable for other MTAs, but this part of dkim-rotate could easily be extended.) Conclusion I have today released dkim-rotate 0.4, which is the first public release for general use. I have it deployed and working, but it s new so there may well be bugs to work out. If you would like to try it out, you can get it via git from Debian Salsa. (Debian folks can also find it freshly in Debian unstable.)

comment count unavailable comments

30 July 2022

Ian Jackson: chiark s skip-skip-cross-up-grade

Two weeks ago I upgraded chiark from Debian jessie i386 to bullseye amd64, after nearly 30 years running Debian i386. This went really quite well, in fact! Background chiark is my colo - a server I run, which lives in a data centre in London. It hosts ~200 users with shell accounts, various websites and mailing lists, moderators for a number of USENET newsgroups, and countless other services. chiark s internal setup is designed to enable my users to do a maximum number of exciting things with a minimum of intervention from me. chiark s OS install dates to 1993, when I installed Debian 0.93R5, the first version of Debian to advertise the ability to be upgraded without reinstalling. I think that makes it one of the oldest Debian installations in existence. Obviously it s had several new hardware platforms too. (There was a prior install of Linux on the initial hardware, remnants of which can maybe still be seen in some obscure corners of chiark s /usr/local.) chiark s install is also at the very high end of the installation complexity, and customisation, scale: reinstalling it completely would be an enormous amount of work. And it s unique. chiark s upgrade history chiark s last major OS upgrade was to jessie (Debian 8, released in April 2015). That was in 2016. Since then we have been relying on Debian s excellent security support posture, and the Debian LTS and more recently Freexian s Debian ELTS projects and some local updates, The use of ELTS - which supports only a subset of packages - was particularly uncomfortable. Additionally, chiark was installed with 32-bit x86 Linux (Debian i386), since that was what was supported and available at the time. But 32-bit is looking very long in the tooth. Why do a skip upgrade So, I wanted to move to the fairly recent stable release - Debian 11 (bullseye), which is just short of a year old. And I wanted to crossgrade (as its called) to 64-bit. In the past, I have found I have had greater success by doing direct upgrades, skipping intermediate releases, rather than by following the officially-supported path of going via every intermediate release. Doing a skip upgrade avoids exposure to any packaging bugs which were present only in intermediate release(s). Debian does usually fix bugs, but Debian has many cautious users, so it is not uncommon for bugs to be found after release, and then not be fixed until the next one. A skip upgrade avoids the need to try to upgrade to already-obsolete releases (which can involve messing about with multiple snapshots from snapshot.debian.org. It is also significantly faster and simpler, which is important not only because it reduces downtime, but also because it removes opportunities (and reduces the time available) for things to go badly. One downside is that sometimes maintainers aggressively remove compatibility measures for older releases. (And compatibililty packages are generally removed quite quickly by even cautious maintainers.) That means that the sysadmin who wants to skip-upgrade needs to do more manual fixing of things that haven t been dealt with automatically. And occasionally one finds compatibility problems that show up only when mixing very old and very new software, that no-one else has seen. Crossgrading Crossgrading is fairly complex and hazardous. It is well supported by the low level tools (eg, dpkg) but the higher-level packaging tools (eg, apt) get very badly confused. Nowadays the system is so complex that downloading things by hand and manually feeding them to dpkg is impractical, other than as a very occasional last resort. The approach, generally, has been to set the system up to want to be the new architecture, run apt in a download-only mode, and do the package installation manually, with some fixing up and retrying, until the system is coherent enough for apt to work. This is the approach I took. (In current releases, there are tools that will help but they are only in recent releases and I wanted to go direct. I also doubted that they would work properly on chiark, since it s so unusual.) Peril and planning Overall, this was a risky strategy to choose. The package dependencies wouldn t necessarily express all of the sequencing needed. But it still seemed that if I could come up with a working recipe, I could do it. I restored most of one of chiark s backups onto a scratch volume on my laptop. With the LVM snapshot tools and chroots. I was able to develop and test a set of scripts that would perform the upgrade. This was a very effective approach: my super-fast laptop, with local caches of the package repositories, was able to do many edit, test, debug cycles. My recipe made heavy use of snapshot.debian.org, to make sure that it wouldn t rot between testing and implementation. When I had a working scheme, I told my users about the planned downtime. I warned everyone it might take even 2 or 3 days. I made sure that my access arrangemnts to the data centre were in place, in case I needed to visit in person. (I have remote serial console and power cycler access.) Reality - the terrible rescue install My first task on taking the service down was the check that the emergency rescue installation worked: chiark has an ancient USB stick in the back, which I can boot to from the BIOS. The idea being that many things that go wrong could be repaired from there. I found that that install was too old to understand chiark s storage arrangements. mdadm tools gave very strange output. So I needed to upgrade it. After some experiments, I rebooted back into the main install, bringing chiark s service back online. I then used the main install of chiark as a kind of meta-rescue-image for the rescue-image. The process of getting the rescue image upgraded (not even to amd64, but just to something not totally ancient) was fraught. Several times I had to rescue it by copying files in from the main install outside. And, the rescue install was on a truly ancient 2G USB stick which was terribly terribly slow, and also very small. I hadn t done any significant planning for this subtask, because it was low-risk: there was little way to break the main install. Due to all these adverse factors, sorting out the rescue image took five hours. If I had known how long it would take, at the beginning, I would have skipped it. 5 hours is more than it would have taken to go to London and fix something in person. Reality - the actual core upgrade I was able to start the actual upgrade in the mid-afternoon. I meticulously checked and executed the steps from my plan. The terrifying scripts which sequenced the critical package updates ran flawlessly. Within an hour or so I had a system which was running bullseye amd64, albeit with many important packages still missing or unconfigured. So I didn t need the rescue image after all, nor to go to the datacentre. Fixing all the things Then I had to deal with all the inevitable fallout from an upgrade. Notable incidents: exim4 has a new tainting system This is to try to help the sysadmin avoid writing unsafe string interpolations. ( Little Bobby Tables. ) This was done by Exim upstream in a great hurry as part of a security response process. The new checks meant that the mail configuration did not work at all. I had to turn off the taint check completely. I m fairly confident that this is correct, because I am hyper-aware of quoting issues and all of my configuration is written to avoid the problems that tainting is supposed to avoid. One particular annoyance is that the approach taken for sqlite lookups makes it totally impossible to use more than one sqlite database. I think the sqlite quoting operator which one uses to interpolate values produces tainted output? I need to investigate this properly. LVM now ignores PVs which are directly contained within LVs by default chiark has LVM-on-RAID-on-LVM. This generally works really well. However, there was one edge case where I ended up without the intermediate RAID layer. The result is LVM-on-LVM. But recent versions of the LVM tools do not look at PVs inside LVs, by default. This is to help you avoid corrupting the state of any VMs you have on your system. I didn t know that at the time, though. All I knew was that LVM was claiming my PV was unusable , and wouldn t explain why. I was about to start on a thorough reading of the 15,000-word essay that is the commentary in the default /etc/lvm/lvm.conf to try to see if anything was relevant, when I received a helpful tipoff on IRC pointing me to the scan_lvs option. I need to file a bug asking for the LVM tools to explain why they have declared a PV unuseable. apache2 s default config no longer read one of my config files I had to do a merge (of my changes vs the maintainers changes) for /etc/apache2/apache2.conf. When doing this merge I failed to notice that the file /etc/apache2/conf.d/httpd.conf was no longer included by default. My merge dropped that line. There were some important things in there, and until I found this the webserver was broken. dpkg --skip-same-version DTWT during a crossgrade (This is not a fix all the things - I found it when developing my upgrade process.) When doing a crossgrade, one often wants to say to dpkg install all these things, but don t reinstall things that have already been done . That s what --skip-same-version is for. However, the logic had not been updated as part of the work to support multiarch, so it was wrong. I prepared a patched version of dpkg, and inserted it in the appropriate point in my prepared crossgrade plan. The patch is now filed as bug #1014476 against dpkg upstream Mailman Mailman is no longer in bullseye. It s only available in the previous release, buster. bullseye has Mailman 3 which is a totally different system - requiring basically, a completely new install and configuration. To even preserve existing archive links (a very important requirement) is decidedly nontrivial. I decided to punt on this whole situation. Currently chiark is running buster s version of Mailman. I will have to deal with this at some point and I m not looking forward to it. Python Of course that Mailman is Python 2. The Python project s extremely badly handled transition includes a recommendation to change the meaning of #!/usr/bin/python from Python 2, to Python 3. But Python 3 is a new language, barely compatible with Python 2 even in the most recent iterations of both, and it is usual to need to coinstall them. Happily Debian have provided the python-is-python2 package to make things work sensibly, albeit with unpleasant imprecations in the package summary description. USENET news Oh my god. INN uses many non-portable data formats, which just depend on your C types. And there are complicated daemons, statically linked libraries which cache on-disk data, and much to go wrong. I had numerous problems with this, and several outages and malfunctions. I may write about that on a future occasion.
(edited 2022-07-20 11:36 +01:00 and 2022-07-30 12:28+01:00 to fix typos)


comment count unavailable comments

17 June 2022

Antoine Beaupr : Matrix notes

I have some concerns about Matrix (the protocol, not the movie that came out recently, although I do have concerns about that as well). I've been watching the project for a long time, and it seems more a promising alternative to many protocols like IRC, XMPP, and Signal. This review may sound a bit negative, because it focuses on those concerns. I am the operator of an IRC network and people keep asking me to bridge it with Matrix. I have myself considered just giving up on IRC and converting to Matrix. This space is a living document exploring my research of that problem space. The TL;DR: is that no, I'm not setting up a bridge just yet, and I'm still on IRC. This article was written over the course of the last three months, but I have been watching the Matrix project for years (my logs seem to say 2016 at least). The article is rather long. It will likely take you half an hour to read, so copy this over to your ebook reader, your tablet, or dead trees, and lean back and relax as I show you around the Matrix. Or, alternatively, just jump to a section that interest you, most likely the conclusion.

Introduction to Matrix Matrix is an "open standard for interoperable, decentralised, real-time communication over IP. It can be used to power Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere you need a standard HTTP API for publishing and subscribing to data whilst tracking the conversation history". It's also (when compared with XMPP) "an eventually consistent global JSON database with an HTTP API and pubsub semantics - whilst XMPP can be thought of as a message passing protocol." According to their FAQ, the project started in 2014, has about 20,000 servers, and millions of users. Matrix works over HTTPS but over a special port: 8448.

Security and privacy I have some concerns about the security promises of Matrix. It's advertised as a "secure" with "E2E [end-to-end] encryption", but how does it actually work?

Data retention defaults One of my main concerns with Matrix is data retention, which is a key part of security in a threat model where (for example) an hostile state actor wants to surveil your communications and can seize your devices. On IRC, servers don't actually keep messages all that long: they pass them along to other servers and clients as fast as they can, only keep them in memory, and move on to the next message. There are no concerns about data retention on messages (and their metadata) other than the network layer. (I'm ignoring the issues with user registration, which is a separate, if valid, concern.) Obviously, an hostile server could log everything passing through it, but IRC federations are normally tightly controlled. So, if you trust your IRC operators, you should be fairly safe. Obviously, clients can (and often do, even if OTR is configured!) log all messages, but this is generally not the default. Irssi, for example, does not log by default. IRC bouncers are more likely to log to disk, of course, to be able to do what they do. Compare this to Matrix: when you send a message to a Matrix homeserver, that server first stores it in its internal SQL database. Then it will transmit that message to all clients connected to that server and room, and to all other servers that have clients connected to that room. Those remote servers, in turn, will keep a copy of that message and all its metadata in their own database, by default forever. On encrypted rooms those messages are encrypted, but not their metadata. There is a mechanism to expire entries in Synapse, but it is not enabled by default. So one should generally assume that a message sent on Matrix is never expired.

GDPR in the federation But even if that setting was enabled by default, how do you control it? This is a fundamental problem of the federation: if any user is allowed to join a room (which is the default), those user's servers will log all content and metadata from that room. That includes private, one-on-one conversations, since those are essentially rooms as well. In the context of the GDPR, this is really tricky: who is the responsible party (known as the "data controller") here? It's basically any yahoo who fires up a home server and joins a room. In a federated network, one has to wonder whether GDPR enforcement is even possible at all. But in Matrix in particular, if you want to enforce your right to be forgotten in a given room, you would have to:
  1. enumerate all the users that ever joined the room while you were there
  2. discover all their home servers
  3. start a GDPR procedure against all those servers
I recognize this is a hard problem to solve while still keeping an open ecosystem. But I believe that Matrix should have much stricter defaults towards data retention than right now. Message expiry should be enforced by default, for example. (Note that there are also redaction policies that could be used to implement part of the GDPR automatically, see the privacy policy discussion below on that.) Also keep in mind that, in the brave new peer-to-peer world that Matrix is heading towards, the boundary between server and client is likely to be fuzzier, which would make applying the GDPR even more difficult. Update: this comment links to this post (in german) which apparently studied the question and concluded that Matrix is not GDPR-compliant. In fact, maybe Synapse should be designed so that there's no configurable flag to turn off data retention. A bit like how most system loggers in UNIX (e.g. syslog) come with a log retention system that typically rotate logs after a few weeks or month. Historically, this was designed to keep hard drives from filling up, but it also has the added benefit of limiting the amount of personal information kept on disk in this modern day. (Arguably, syslog doesn't rotate logs on its own, but, say, Debian GNU/Linux, as an installed system, does have log retention policies well defined for installed packages, and those can be discussed. And "no expiry" is definitely a bug.

Matrix.org privacy policy When I first looked at Matrix, five years ago, Element.io was called Riot.im and had a rather dubious privacy policy:
We currently use cookies to support our use of Google Analytics on the Website and Service. Google Analytics collects information about how you use the Website and Service. [...] This helps us to provide you with a good experience when you browse our Website and use our Service and also allows us to improve our Website and our Service.
When I asked Matrix people about why they were using Google Analytics, they explained this was for development purposes and they were aiming for velocity at the time, not privacy (paraphrasing here). They also included a "free to snitch" clause:
If we are or believe that we are under a duty to disclose or share your personal data, we will do so in order to comply with any legal obligation, the instructions or requests of a governmental authority or regulator, including those outside of the UK.
Those are really broad terms, above and beyond what is typically expected legally. Like the current retention policies, such user tracking and ... "liberal" collaboration practices with the state set a bad precedent for other home servers. Thankfully, since the above policy was published (2017), the GDPR was "implemented" (2018) and it seems like both the Element.io privacy policy and the Matrix.org privacy policy have been somewhat improved since. Notable points of the new privacy policies:
  • 2.3.1.1: the "federation" section actually outlines that "Federated homeservers and Matrix clients which respect the Matrix protocol are expected to honour these controls and redaction/erasure requests, but other federated homeservers are outside of the span of control of Element, and we cannot guarantee how this data will be processed"
  • 2.6: users under the age of 16 should not use the matrix.org service
  • 2.10: Upcloud, Mythic Beast, Amazon, and CloudFlare possibly have access to your data (it's nice to at least mention this in the privacy policy: many providers don't even bother admitting to this kind of delegation)
  • Element 2.2.1: mentions many more third parties (Twilio, Stripe, Quaderno, LinkedIn, Twitter, Google, Outplay, PipeDrive, HubSpot, Posthog, Sentry, and Matomo (phew!) used when you are paying Matrix.org for hosting
I'm not super happy with all the trackers they have on the Element platform, but then again you don't have to use that service. Your favorite homeserver (assuming you are not on Matrix.org) probably has their own Element deployment, hopefully without all that garbage. Overall, this is all a huge improvement over the previous privacy policy, so hats off to the Matrix people for figuring out a reasonable policy in such a tricky context. I particularly like this bit:
We will forget your copy of your data upon your request. We will also forward your request to be forgotten onto federated homeservers. However - these homeservers are outside our span of control, so we cannot guarantee they will forget your data.
It's great they implemented those mechanisms and, after all, if there's an hostile party in there, nothing can prevent them from using screenshots to just exfiltrate your data away from the client side anyways, even with services typically seen as more secure, like Signal. As an aside, I also appreciate that Matrix.org has a fairly decent code of conduct, based on the TODO CoC which checks all the boxes in the geekfeminism wiki.

Metadata handling Overall, privacy protections in Matrix mostly concern message contents, not metadata. In other words, who's talking with who, when and from where is not well protected. Compared to a tool like Signal, which goes through great lengths to anonymize that data with features like private contact discovery, disappearing messages, sealed senders, and private groups, Matrix is definitely behind. (Note: there is an issue open about message lifetimes in Element since 2020, but it's not at even at the MSC stage yet.) This is a known issue (opened in 2019) in Synapse, but this is not just an implementation issue, it's a flaw in the protocol itself. Home servers keep join/leave of all rooms, which gives clear text information about who is talking to. Synapse logs may also contain privately identifiable information that home server admins might not be aware of in the first place. Those log rotation policies are separate from the server-level retention policy, which may be confusing for a novice sysadmin. Combine this with the federation: even if you trust your home server to do the right thing, the second you join a public room with third-party home servers, those ideas kind of get thrown out because those servers can do whatever they want with that information. Again, a problem that is hard to solve in any federation. To be fair, IRC doesn't have a great story here either: any client knows not only who's talking to who in a room, but also typically their client IP address. Servers can (and often do) obfuscate this, but often that obfuscation is trivial to reverse. Some servers do provide "cloaks" (sometimes automatically), but that's kind of a "slap-on" solution that actually moves the problem elsewhere: now the server knows a little more about the user. Overall, I would worry much more about a Matrix home server seizure than a IRC or Signal server seizure. Signal does get subpoenas, and they can only give out a tiny bit of information about their users: their phone number, and their registration, and last connection date. Matrix carries a lot more information in its database.

Amplification attacks on URL previews I (still!) run an Icecast server and sometimes share links to it on IRC which, obviously, also ends up on (more than one!) Matrix home servers because some people connect to IRC using Matrix. This, in turn, means that Matrix will connect to that URL to generate a link preview. I feel this outlines a security issue, especially because those sockets would be kept open seemingly forever. I tried to warn the Matrix security team but somehow, I don't think this issue was taken very seriously. Here's the disclosure timeline:
  • January 18: contacted Matrix security
  • January 19: response: already reported as a bug
  • January 20: response: can't reproduce
  • January 31: timeout added, considered solved
  • January 31: I respond that I believe the security issue is underestimated, ask for clearance to disclose
  • February 1: response: asking for two weeks delay after the next release (1.53.0) including another patch, presumably in two weeks' time
  • February 22: Matrix 1.53.0 released
  • April 14: I notice the release, ask for clearance again
  • April 14: response: referred to the public disclosure
There are a couple of problems here:
  1. the bug was publicly disclosed in September 2020, and not considered a security issue until I notified them, and even then, I had to insist
  2. no clear disclosure policy timeline was proposed or seems established in the project (there is a security disclosure policy but it doesn't include any predefined timeline)
  3. I wasn't informed of the disclosure
  4. the actual solution is a size limit (10MB, already implemented), a time limit (30 seconds, implemented in PR 11784), and a content type allow list (HTML, "media" or JSON, implemented in PR 11936), and I'm not sure it's adequate
  5. (pure vanity:) I did not make it to their Hall of fame
I'm not sure those solutions are adequate because they all seem to assume a single home server will pull that one URL for a little while then stop. But in a federated network, many (possibly thousands) home servers may be connected in a single room at once. If an attacker drops a link into such a room, all those servers would connect to that link all at once. This is an amplification attack: a small amount of traffic will generate a lot more traffic to a single target. It doesn't matter there are size or time limits: the amplification is what matters here. It should also be noted that clients that generate link previews have more amplification because they are more numerous than servers. And of course, the default Matrix client (Element) does generate link previews as well. That said, this is possibly not a problem specific to Matrix: any federated service that generates link previews may suffer from this. I'm honestly not sure what the solution is here. Maybe moderation? Maybe link previews are just evil? All I know is there was this weird bug in my Icecast server and I tried to ring the bell about it, and it feels it was swept under the rug. Somehow I feel this is bound to blow up again in the future, even with the current mitigation.

Moderation In Matrix like elsewhere, Moderation is a hard problem. There is a detailed moderation guide and much of this problem space is actively worked on in Matrix right now. A fundamental problem with moderating a federated space is that a user banned from a room can rejoin the room from another server. This is why spam is such a problem in Email, and why IRC networks have stopped federating ages ago (see the IRC history for that fascinating story).

The mjolnir bot The mjolnir moderation bot is designed to help with some of those things. It can kick and ban users, redact all of a user's message (as opposed to one by one), all of this across multiple rooms. It can also subscribe to a federated block list published by matrix.org to block known abusers (users or servers). Bans are pretty flexible and can operate at the user, room, or server level. Matrix people suggest making the bot admin of your channels, because you can't take back admin from a user once given.

The command-line tool There's also a new command line tool designed to do things like:
  • System notify users (all users/users from a list, specific user)
  • delete sessions/devices not seen for X days
  • purge the remote media cache
  • select rooms with various criteria (external/local/empty/created by/encrypted/cleartext)
  • purge history of theses rooms
  • shutdown rooms
This tool and Mjolnir are based on the admin API built into Synapse.

Rate limiting Synapse has pretty good built-in rate-limiting which blocks repeated login, registration, joining, or messaging attempts. It may also end up throttling servers on the federation based on those settings.

Fundamental federation problems Because users joining a room may come from another server, room moderators are at the mercy of the registration and moderation policies of those servers. Matrix is like IRC's +R mode ("only registered users can join") by default, except that anyone can register their own homeserver, which makes this limited. Server admins can block IP addresses and home servers, but those tools are not easily available to room admins. There is an API (m.room.server_acl in /devtools) but it is not reliable (thanks Austin Huang for the clarification). Matrix has the concept of guest accounts, but it is not used very much, and virtually no client or homeserver supports it. This contrasts with the way IRC works: by default, anyone can join an IRC network even without authentication. Some channels require registration, but in general you are free to join and look around (until you get blocked, of course). I have seen anecdotal evidence (CW: Twitter, nitter link) that "moderating bridges is hell", and I can imagine why. Moderation is already hard enough on one federation, when you bridge a room with another network, you inherit all the problems from that network but without the entire abuse control tools from the original network's API...

Room admins Matrix, in particular, has the problem that room administrators (which have the power to redact messages, ban users, and promote other users) are bound to their Matrix ID which is, in turn, bound to their home servers. This implies that a home server administrators could (1) impersonate a given user and (2) use that to hijack the room. So in practice, the home server is the trust anchor for rooms, not the user themselves. That said, if server B administrator hijack user joe on server B, they will hijack that room on that specific server. This will not (necessarily) affect users on the other servers, as servers could refuse parts of the updates or ban the compromised account (or server). It does seem like a major flaw that room credentials are bound to Matrix identifiers, as opposed to the E2E encryption credentials. In an encrypted room even with fully verified members, a compromised or hostile home server can still take over the room by impersonating an admin. That admin (or even a newly minted user) can then send events or listen on the conversations. This is even more frustrating when you consider that Matrix events are actually signed and therefore have some authentication attached to them, acting like some sort of Merkle tree (as it contains a link to previous events). That signature, however, is made from the homeserver PKI keys, not the client's E2E keys, which makes E2E feel like it has been "bolted on" later.

Availability While Matrix has a strong advantage over Signal in that it's decentralized (so anyone can run their own homeserver,), I couldn't find an easy way to run a "multi-primary" setup, or even a "redundant" setup (even if with a single primary backend), short of going full-on "replicate PostgreSQL and Redis data", which is not typically for the faint of heart.

How this works in IRC On IRC, it's quite easy to setup redundant nodes. All you need is:
  1. a new machine (with it's own public address with an open port)
  2. a shared secret (or certificate) between that machine and an existing one on the network
  3. a connect block on both servers
That's it: the node will join the network and people can connect to it as usual and share the same user/namespace as the rest of the network. The servers take care of synchronizing state: you do not need to worry about replicating a database server. (Now, experienced IRC people will know there's a catch here: IRC doesn't have authentication built in, and relies on "services" which are basically bots that authenticate users (I'm simplifying, don't nitpick). If that service goes down, the network still works, but then people can't authenticate, and they can start doing nasty things like steal people's identity if they get knocked offline. But still: basic functionality still works: you can talk in rooms and with users that are on the reachable network.)

User identities Matrix is more complicated. Each "home server" has its own identity namespace: a specific user (say @anarcat:matrix.org) is bound to that specific home server. If that server goes down, that user is completely disconnected. They could register a new account elsewhere and reconnect, but then they basically lose all their configuration: contacts, joined channels are all lost. (Also notice how the Matrix IDs don't look like a typical user address like an email in XMPP. They at least did their homework and got the allocation for the scheme.)

Rooms Users talk to each other in "rooms", even in one-to-one communications. (Rooms are also used for other things like "spaces", they're basically used for everything, think "everything is a file" kind of tool.) For rooms, home servers act more like IRC nodes in that they keep a local state of the chat room and synchronize it with other servers. Users can keep talking inside a room if the server that originally hosts the room goes down. Rooms can have a local, server-specific "alias" so that, say, #room:matrix.org is also visible as #room:example.com on the example.com home server. Both addresses refer to the same room underlying room. (Finding this in the Element settings is not obvious though, because that "alias" are actually called a "local address" there. So to create such an alias (in Element), you need to go in the room settings' "General" section, "Show more" in "Local address", then add the alias name (e.g. foo), and then that room will be available on your example.com homeserver as #foo:example.com.) So a room doesn't belong to a server, it belongs to the federation, and anyone can join the room from any serer (if the room is public, or if invited otherwise). You can create a room on server A and when a user from server B joins, the room will be replicated on server B as well. If server A fails, server B will keep relaying traffic to connected users and servers. A room is therefore not fundamentally addressed with the above alias, instead ,it has a internal Matrix ID, which basically a random string. It has a server name attached to it, but that was made just to avoid collisions. That can get a little confusing. For example, the #fractal:gnome.org room is an alias on the gnome.org server, but the room ID is !hwiGbsdSTZIwSRfybq:matrix.org. That's because the room was created on matrix.org, but the preferred branding is gnome.org now. As an aside, rooms, by default, live forever, even after the last user quits. There's an admin API to delete rooms and a tombstone event to redirect to another one, but neither have a GUI yet. The latter is part of MSC1501 ("Room version upgrades") which allows a room admin to close a room, with a message and a pointer to another room.

Spaces Discovering rooms can be tricky: there is a per-server room directory, but Matrix.org people are trying to deprecate it in favor of "Spaces". Room directories were ripe for abuse: anyone can create a room, so anyone can show up in there. It's possible to restrict who can add aliases, but anyways directories were seen as too limited. In contrast, a "Space" is basically a room that's an index of other rooms (including other spaces), so existing moderation and administration mechanism that work in rooms can (somewhat) work in spaces as well. This enables a room directory that works across federation, regardless on which server they were originally created. New users can be added to a space or room automatically in Synapse. (Existing users can be told about the space with a server notice.) This gives admins a way to pre-populate a list of rooms on a server, which is useful to build clusters of related home servers, providing some sort of redundancy, at the room -- not user -- level.

Home servers So while you can workaround a home server going down at the room level, there's no such thing at the home server level, for user identities. So if you want those identities to be stable in the long term, you need to think about high availability. One limitation is that the domain name (e.g. matrix.example.com) must never change in the future, as renaming home servers is not supported. The documentation used to say you could "run a hot spare" but that has been removed. Last I heard, it was not possible to run a high-availability setup where multiple, separate locations could replace each other automatically. You can have high performance setups where the load gets distributed among workers, but those are based on a shared database (Redis and PostgreSQL) backend. So my guess is it would be possible to create a "warm" spare server of a matrix home server with regular PostgreSQL replication, but that is not documented in the Synapse manual. This sort of setup would also not be useful to deal with networking issues or denial of service attacks, as you will not be able to spread the load over multiple network locations easily. Redis and PostgreSQL heroes are welcome to provide their multi-primary solution in the comments. In the meantime, I'll just point out this is a solution that's handled somewhat more gracefully in IRC, by having the possibility of delegating the authentication layer.

Delegations If you do not want to run a Matrix server yourself, it's possible to delegate the entire thing to another server. There's a server discovery API which uses the .well-known pattern (or SRV records, but that's "not recommended" and a bit confusing) to delegate that service to another server. Be warned that the server still needs to be explicitly configured for your domain. You can't just put:
  "m.server": "matrix.org:443"  
... on https://example.com/.well-known/matrix/server and start using @you:example.com as a Matrix ID. That's because Matrix doesn't support "virtual hosting" and you'd still be connecting to rooms and people with your matrix.org identity, not example.com as you would normally expect. This is also why you cannot rename your home server. The server discovery API is what allows servers to find each other. Clients, on the other hand, use the client-server discovery API: this is what allows a given client to find your home server when you type your Matrix ID on login.

Performance The high availability discussion brushed over the performance of Matrix itself, but let's now dig into that.

Horizontal scalability There were serious scalability issues of the main Matrix server, Synapse, in the past. So the Matrix team has been working hard to improve its design. Since Synapse 1.22 the home server can horizontally scale to multiple workers (see this blog post for details) which can make it easier to scale large servers.

Other implementations There are other promising home servers implementations from a performance standpoint (dendrite, Golang, entered beta in late 2020; conduit, Rust, beta; others), but none of those are feature-complete so there's a trade-off to be made there. Synapse is also adding a lot of feature fast, so it's an open question whether the others will ever catch up. (I have heard that Dendrite might actually surpass Synapse in features within a few years, which would put Synapse in a more "LTS" situation.)

Latency Matrix can feel slow sometimes. For example, joining the "Matrix HQ" room in Element (from matrix.debian.social) takes a few minutes and then fails. That is because the home server has to sync the entire room state when you join the room. There was promising work on this announced in the lengthy 2021 retrospective, and some of that work landed (partial sync) in the 1.53 release already. Other improvements coming include sliding sync, lazy loading over federation, and fast room joins. So that's actually something that could be fixed in the fairly short term. But in general, communication in Matrix doesn't feel as "snappy" as on IRC or even Signal. It's hard to quantify this without instrumenting a full latency test bed (for example the tools I used in the terminal emulators latency tests), but even just typing in a web browser feels slower than typing in a xterm or Emacs for me. Even in conversations, I "feel" people don't immediately respond as fast. In fact, this could be an interesting double-blind experiment to make: have people guess whether they are talking to a person on Matrix, XMPP, or IRC, for example. My theory would be that people could notice that Matrix users are slower, if only because of the TCP round-trip time each message has to take.

Transport Some courageous person actually made some tests of various messaging platforms on a congested network. His evaluation was basically:
  • Briar: uses Tor, so unusable except locally
  • Matrix: "struggled to send and receive messages", joining a room takes forever as it has to sync all history, "took 20-30 seconds for my messages to be sent and another 20 seconds for further responses"
  • XMPP: "worked in real-time, full encryption, with nearly zero lag"
So that was interesting. I suspect IRC would have also fared better, but that's just a feeling. Other improvements to the transport layer include support for websocket and the CoAP proxy work from 2019 (targeting 100bps links), but both seem stalled at the time of writing. The Matrix people have also announced the pinecone p2p overlay network which aims at solving large, internet-scale routing problems. See also this talk at FOSDEM 2022.

Usability

Onboarding and workflow The workflow for joining a room, when you use Element web, is not great:
  1. click on a link in a web browser
  2. land on (say) https://matrix.to/#/#matrix-dev:matrix.org
  3. offers "Element", yeah that's sounds great, let's click "Continue"
  4. land on https://app.element.io/#/room%2F%23matrix-dev%3Amatrix.org and then you need to register, aaargh
As you might have guessed by now, there is a specification to solve this, but web browsers need to adopt it as well, so that's far from actually being solved. At least browsers generally know about the matrix: scheme, it's just not exactly clear what they should do with it, especially when the handler is just another web page (e.g. Element web). In general, when compared with tools like Signal or WhatsApp, Matrix doesn't fare so well in terms of user discovery. I probably have some of my normal contacts that have a Matrix account as well, but there's really no way to know. It's kind of creepy when Signal tells you "this person is on Signal!" but it's also pretty cool that it works, and they actually implemented it pretty well. Registration is also less obvious: in Signal, the app confirms your phone number automatically. It's friction-less and quick. In Matrix, you need to learn about home servers, pick one, register (with a password! aargh!), and then setup encryption keys (not default), etc. It's a lot more friction. And look, I understand: giving away your phone number is a huge trade-off. I don't like it either. But it solves a real problem and makes encryption accessible to a ton more people. Matrix does have "identity servers" that can serve that purpose, but I don't feel confident sharing my phone number there. It doesn't help that the identity servers don't have private contact discovery: giving them your phone number is a more serious security compromise than with Signal. There's a catch-22 here too: because no one feels like giving away their phone numbers, no one does, and everyone assumes that stuff doesn't work anyways. Like it or not, Signal forcing people to divulge their phone number actually gives them critical mass that means actually a lot of my relatives are on Signal and I don't have to install crap like WhatsApp to talk with them.

5 minute clients evaluation Throughout all my tests I evaluated a handful of Matrix clients, mostly from Flathub because almost none of them are packaged in Debian. Right now I'm using Element, the flagship client from Matrix.org, in a web browser window, with the PopUp Window extension. This makes it look almost like a native app, and opens links in my main browser window (instead of a new tab in that separate window), which is nice. But I'm tired of buying memory to feed my web browser, so this indirection has to stop. Furthermore, I'm often getting completely logged off from Element, which means re-logging in, recovering my security keys, and reconfiguring my settings. That is extremely annoying. Coming from Irssi, Element is really "GUI-y" (pronounced "gooey"). Lots of clickety happening. To mark conversations as read, in particular, I need to click-click-click on all the tabs that have some activity. There's no "jump to latest message" or "mark all as read" functionality as far as I could tell. In Irssi the former is built-in (alt-a) and I made a custom /READ command for the latter:
/ALIAS READ script exec \$_->activity(0) for Irssi::windows
And yes, that's a Perl script in my IRC client. I am not aware of any Matrix client that does stuff like that, except maybe Weechat, if we can call it a Matrix client, or Irssi itself, now that it has a Matrix plugin (!). As for other clients, I have looked through the Matrix Client Matrix (confusing right?) to try to figure out which one to try, and, even after selecting Linux as a filter, the chart is just too wide to figure out anything. So I tried those, kind of randomly:
  • Fractal
  • Mirage
  • Nheko
  • Quaternion
Unfortunately, I lost my notes on those, I don't actually remember which one did what. I still have a session open with Mirage, so I guess that means it's the one I preferred, but I remember they were also all very GUI-y. Maybe I need to look at weechat-matrix or gomuks. At least Weechat is scriptable so I could continue playing the power-user. Right now my strategy with messaging (and that includes microblogging like Twitter or Mastodon) is that everything goes through my IRC client, so Weechat could actually fit well in there. Going with gomuks, on the other hand, would mean running it in parallel with Irssi or ... ditching IRC, which is a leap I'm not quite ready to take just yet. Oh, and basically none of those clients (except Nheko and Element) support VoIP, which is still kind of a second-class citizen in Matrix. It does not support large multimedia rooms, for example: Jitsi was used for FOSDEM instead of the native videoconferencing system.

Bots This falls a little aside the "usability" section, but I didn't know where to put this... There's a few Matrix bots out there, and you are likely going to be able to replace your existing bots with Matrix bots. It's true that IRC has a long and impressive history with lots of various bots doing various things, but given how young Matrix is, there's still a good variety:
  • maubot: generic bot with tons of usual plugins like sed, dice, karma, xkcd, echo, rss, reminder, translate, react, exec, gitlab/github webhook receivers, weather, etc
  • opsdroid: framework to implement "chat ops" in Matrix, connects with Matrix, GitHub, GitLab, Shell commands, Slack, etc
  • matrix-nio: another framework, used to build lots more bots like:
    • hemppa: generic bot with various functionality like weather, RSS feeds, calendars, cron jobs, OpenStreetmaps lookups, URL title snarfing, wolfram alpha, astronomy pic of the day, Mastodon bridge, room bridging, oh dear
    • devops: ping, curl, etc
    • podbot: play podcast episodes from AntennaPod
    • cody: Python, Ruby, Javascript REPL
    • eno: generic bot, "personal assistant"
  • mjolnir: moderation bot
  • hookshot: bridge with GitLab/GitHub
  • matrix-monitor-bot: latency monitor
One thing I haven't found an equivalent for is Debian's MeetBot. There's an archive bot but it doesn't have topics or a meeting chair, or HTML logs.

Working on Matrix As a developer, I find Matrix kind of intimidating. The specification is huge. The official specification itself looks somewhat digestable: it's only 6 APIs so that looks, at first, kind of reasonable. But whenever you start asking complicated questions about Matrix, you quickly fall into the Matrix Spec Change specification (which, yes, is a separate specification). And there are literally hundreds of MSCs flying around. It's hard to tell what's been adopted and what hasn't, and even harder to figure out if your specific client has implemented it. (One trendy answer to this problem is to "rewrite it in rust": Matrix are working on implementing a lot of those specifications in a matrix-rust-sdk that's designed to take the implementation details away from users.) Just taking the latest weekly Matrix report, you find that three new MSCs proposed, just last week! There's even a graph that shows the number of MSCs is progressing steadily, at 600+ proposals total, with the majority (300+) "new". I would guess the "merged" ones are at about 150. That's a lot of text which includes stuff like 3D worlds which, frankly, I don't think you should be working on when you have such important security and usability problems. (The internet as a whole, arguably, doesn't fare much better. RFC600 is a really obscure discussion about "INTERFACING AN ILLINOIS PLASMA TERMINAL TO THE ARPANET". Maybe that's how many MSCs will end up as well, left forgotten in the pits of history.) And that's the thing: maybe the Matrix people have a different objective than I have. They want to connect everything to everything, and make Matrix a generic transport for all sorts of applications, including virtual reality, collaborative editors, and so on. I just want secure, simple messaging. Possibly with good file transfers, and video calls. That it works with existing stuff is good, and it should be federated to remove the "Signal point of failure". So I'm a bit worried with the direction all those MSCs are taking, especially when you consider that clients other than Element are still struggling to keep up with basic features like end-to-end encryption or room discovery, never mind voice or spaces...

Conclusion Overall, Matrix is somehow in the space XMPP was a few years ago. It has a ton of features, pretty good clients, and a large community. It seems to have gained some of the momentum that XMPP has lost. It may have the most potential to replace Signal if something bad would happen to it (like, I don't know, getting banned or going nuts with cryptocurrency)... But it's really not there yet, and I don't see Matrix trying to get there either, which is a bit worrisome.

Looking back at history I'm also worried that we are repeating the errors of the past. The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet, and are all still around (except, arguably, FTP, which was removed from major browsers recently). All of them had to face serious challenges in growing their federation. IRC had numerous conflicts and forks, both at the technical level but also at the political level. The history of IRC is really something that anyone working on a federated system should study in detail, because they are bound to make the same mistakes if they are not familiar with it. The "short" version is:
  • 1988: Finnish researcher publishes first IRC source code
  • 1989: 40 servers worldwide, mostly universities
  • 1990: EFnet ("eris-free network") fork which blocks the "open relay", named Eris - followers of Eris form the A-net, which promptly dissolves itself, with only EFnet remaining
  • 1992: Undernet fork, which offered authentication ("services"), routing improvements and timestamp-based channel synchronisation
  • 1994: DALnet fork, from Undernet, again on a technical disagreement
  • 1995: Freenode founded
  • 1996: IRCnet forks from EFnet, following a flame war of historical proportion, splitting the network between Europe and the Americas
  • 1997: Quakenet founded
  • 1999: (XMPP founded)
  • 2001: 6 million users, OFTC founded
  • 2002: DALnet peaks at 136,000 users
  • 2003: IRC as a whole peaks at 10 million users, EFnet peaks at 141,000 users
  • 2004: (Facebook founded), Undernet peaks at 159,000 users
  • 2005: Quakenet peaks at 242,000 users, IRCnet peaks at 136,000 (Youtube founded)
  • 2006: (Twitter founded)
  • 2009: (WhatsApp, Pinterest founded)
  • 2010: (TextSecure AKA Signal, Instagram founded)
  • 2011: (Snapchat founded)
  • ~2013: Freenode peaks at ~100,000 users
  • 2016: IRCv3 standardisation effort started (TikTok founded)
  • 2021: Freenode self-destructs, Libera chat founded
  • 2022: Libera peaks at 50,000 users, OFTC peaks at 30,000 users
(The numbers were taken from the Wikipedia page and Netsplit.de. Note that I also include other networks launch in parenthesis for context.) Pretty dramatic, don't you think? Eventually, somehow, IRC became irrelevant for most people: few people are even aware of it now. With less than a million users active, it's smaller than Mastodon, XMPP, or Matrix at this point.1 If I were to venture a guess, I'd say that infighting, lack of a standardization body, and a somewhat annoying protocol meant the network could not grow. It's also possible that the decentralised yet centralised structure of IRC networks limited their reliability and growth. But large social media companies have also taken over the space: observe how IRC numbers peak around the time the wave of large social media companies emerge, especially Facebook (2.9B users!!) and Twitter (400M users).

Where the federated services are in history Right now, Matrix, and Mastodon (and email!) are at the "pre-EFnet" stage: anyone can join the federation. Mastodon has started working on a global block list of fascist servers which is interesting, but it's still an open federation. Right now, Matrix is totally open, but matrix.org publishes a (federated) block list of hostile servers (#matrix-org-coc-bl:matrix.org, yes, of course it's a room). Interestingly, Email is also in that stage, where there are block lists of spammers, and it's a race between those blockers and spammers. Large email providers, obviously, are getting closer to the EFnet stage: you could consider they only accept email from themselves or between themselves. It's getting increasingly hard to deliver mail to Outlook and Gmail for example, partly because of bias against small providers, but also because they are including more and more machine-learning tools to sort through email and those systems are, fundamentally, unknowable. It's not quite the same as splitting the federation the way EFnet did, but the effect is similar. HTTP has somehow managed to live in a parallel universe, as it's technically still completely federated: anyone can start a web server if they have a public IP address and anyone can connect to it. The catch, of course, is how you find the darn thing. Which is how Google became one of the most powerful corporations on earth, and how they became the gatekeepers of human knowledge online. I have only briefly mentioned XMPP here, and my XMPP fans will undoubtedly comment on that, but I think it's somewhere in the middle of all of this. It was co-opted by Facebook and Google, and both corporations have abandoned it to its fate. I remember fondly the days where I could do instant messaging with my contacts who had a Gmail account. Those days are gone, and I don't talk to anyone over Jabber anymore, unfortunately. And this is a threat that Matrix still has to face. It's also the threat Email is currently facing. On the one hand corporations like Facebook want to completely destroy it and have mostly succeeded: many people just have an email account to register on things and talk to their friends over Instagram or (lately) TikTok (which, I know, is not Facebook, but they started that fire). On the other hand, you have corporations like Microsoft and Google who are still using and providing email services because, frankly, you still do need email for stuff, just like fax is still around but they are more and more isolated in their own silo. At this point, it's only a matter of time they reach critical mass and just decide that the risk of allowing external mail coming in is not worth the cost. They'll simply flip the switch and work on an allow-list principle. Then we'll have closed the loop and email will be dead, just like IRC is "dead" now. I wonder which path Matrix will take. Could it liberate us from these vicious cycles? Update: this generated some discussions on lobste.rs.

  1. According to Wikipedia, there are currently about 500 distinct IRC networks operating, on about 1,000 servers, serving over 250,000 users. In contrast, Mastodon seems to be around 5 million users, Matrix.org claimed at FOSDEM 2021 to have about 28 million globally visible accounts, and Signal lays claim to over 40 million souls. XMPP claims to have "millions" of users on the xmpp.org homepage but the FAQ says they don't actually know. On the proprietary silo side of the fence, this page says
    • Facebook: 2.9 billion users
    • WhatsApp: 2B
    • Instagram: 1.4B
    • TikTok: 1B
    • Snapchat: 500M
    • Pinterest: 480M
    • Twitter: 397M
    Notable omission from that list: Youtube, with its mind-boggling 2.6 billion users... Those are not the kind of numbers you just "need to convince a brother or sister" to grow the network...

16 June 2022

Dima Kogan: Ricoh GR IIIx 802.11 reverse engineering

I just got a fancy new camera: Ricoh GR IIIx. It's pretty great, and I strongly recommend it to anyone that wants a truly pocketable camera with fantastic image quality and full manual controls. One annoyance is the connectivity. It does have both Bluetooth and 802.11, but the only official method of using them is some dinky closed phone app. This is silly. I just did some reverse-engineering, and I now have a functional shell script to download the last few images via 802.11. This is more convenient than plugging in a wire or pulling out the memory card. Fortunately, Ricoh didn't bend over backwards to make the reversing difficult, so to figure it out I didn't even need to download the phone app, and sniff the traffic. When you turn on the 802.11 on the camera, it says stuff about essid and password, so clearly the camera runs its own access point. Not ideal, but it's good-enough. I connected, and ran nmap to find hosts and open ports: only port 80 on 192.168.0.1 is open. Pointing curl at it yields some error, so I need to figure out the valid endpoints. I downloaded the firmware binary, and tried to figure out what's in it:
dima@shorty:/tmp$ binwalk fwdc243b.bin
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
3036150       0x2E53F6        Cisco IOS microcode, for "8"
3164652       0x3049EC        Certificate in DER format (x509 v3), header length: 4, sequence length: 5412
5472143       0x537F8F        Copyright string: "Copyright ("
6128763       0x5D847B        PARity archive data - file number 90
10711634      0xA37252        gzip compressed data, maximum compression, from Unix, last modified: 2022-02-15 05:47:23
13959724      0xD5022C        MySQL ISAM compressed data file Version 11
24829873      0x17ADFB1       MySQL MISAM compressed data file Version 4
24917663      0x17C369F       MySQL MISAM compressed data file Version 4
24918526      0x17C39FE       MySQL MISAM compressed data file Version 4
24921612      0x17C460C       MySQL MISAM compressed data file Version 4
24948153      0x17CADB9       MySQL MISAM compressed data file Version 4
25221672      0x180DA28       MySQL MISAM compressed data file Version 4
25784158      0x1896F5E       Cisco IOS microcode, for "\"
26173589      0x18F6095       MySQL MISAM compressed data file Version 4
28297588      0x1AFC974       MySQL ISAM compressed data file Version 6
28988307      0x1BA5393       MySQL ISAM compressed data file Version 3
28990184      0x1BA5AE8       MySQL MISAM index file Version 3
29118867      0x1BC5193       MySQL MISAM index file Version 3
29449193      0x1C15BE9       JPEG image data, JFIF standard 1.01
29522133      0x1C278D5       JPEG image data, JFIF standard 1.08
29522412      0x1C279EC       Copyright string: "Copyright ("
29632931      0x1C429A3       JPEG image data, JFIF standard 1.01
29724094      0x1C58DBE       JPEG image data, JFIF standard 1.01
The gzip chunk looks like what I want:
dima@shorty:/tmp$ tail -c+10711635 fwdc243b.bin> /tmp/tst.gz
dima@shorty:/tmp$ < /tmp/tst.gz gunzip   file -
/dev/stdin: ASCII cpio archive (SVR4 with no CRC)
dima@shorty:/tmp$ < /tmp/tst.gz gunzip > tst.cpio
OK, we have some .cpio thing. It's plain-text. I grep around it in, looking for GET and POST and such, and I see various URI-looking things at /v1/..... Grepping for that I see
dima@shorty:/tmp$ strings tst.cpio   grep /v1/
GET /v1/debug/revisions
GET /v1/ping
GET /v1/photos
GET /v1/props
PUT /v1/params/device
PUT /v1/params/lens
PUT /v1/params/camera
GET /v1/liveview
GET /v1/transfers
POST /v1/device/finish
POST /v1/device/wlan/finish
POST /v1/lens/focus
POST /v1/camera/shoot
POST /v1/camera/shoot/compose
POST /v1/camera/shoot/cancel
GET /v1/photos/ / 
GET /v1/photos/ / /info
PUT /v1/photos/ / /transfer
/v1/photos/<string>/<string>
/v1/photos/<string>/<string>/info
/v1/photos/<string>/<string>/transfer
/v1/device/finish
/v1/device/wlan/finish
/v1/lens/focus
/v1/camera/shoot
/v1/camera/shoot/compose
/v1/camera/shoot/cancel
/v1/changes
/v1/changes message received.
/v1/changes issue event.
/v1/changes new websocket connection.
/v1/changes websocket connection closed. reason( )
/v1/transfers, transferState( ), afterIndex( ), limit( )
Jackpot. I pointed curl at most of these, and they do interesting things. Generally they all spit out JSON. /v1/liveview sends out a sequence of JPEG images. The thing I care about is /v1/photos/DIRECTORY/FILE and /v1/photos/DIRECTORY/FILE/info. The result is a script I just wrote to connect to the camera, download N images, and connect back to the original access point: https://github.com/dkogan/ricoh-download Kinda crude, but works for now. I'll improve it with time. After I did this I found an old thread from 2015 where somebody was using an apparently-compatible camera, and wrote a fancier tool: https://www.pentaxforums.com/forums/184-pentax-k-s1-k-s2/295501-k-s2-wifi-laptop-2.html

10 June 2022

Sam Hartman: Flailing to Replace Jack with Pipewire for DJ Audio

I could definitely use some suggestions here, both in terms of things to try or effective places to ask questions about Pipewire audio. The docs are improving, but are still in early stages. Pipewire promises to combine the functionality of PulseAudio and Jack. That would be great for me. I use Jack for my DJ work, and it s somewhat complicated and fragile. However, so far my attempts to replace Jack have been unsuccessful, and I might need to even use PulseAudio instead of Pipewire to get the DJ stuff working correctly. The Setup In the simplest setup I have a DJ controller. It s both a MIDI device and a sound card. It has 4 channel audio, but it s not typical surround sound. Two channels are the main speakers, and two channels are the headphones. Conceptually it might be better to model the controller as two sinks: one for the speakers and one for the headphones. At a hardware level they need to be one device for several reasons, especially including using a common clock. It s really important than only the main mix go out channel 1-2 (the speakers). Random beeps or sound from other applications going out the main speakers is disruptive and unprofessional. However, because I m blind, I need that sound. I especially need the output of Orca (my screen reader) and Emacspeak (another screen reader). So I need that output to go to the headphones. Under Pulse/Jack The DJ card is the Jack primary sound device (system:playback_1 through system:playback_4). I then use themodule-jack-sink Pulse module to connect Pulse to Jack. That becomes the default sink for Pulse, and I link front-left from that sink to system:playback_3. So, I get the system sounds and screen reader mixed into the left channel of my headphones and nowhere else. Enter Pipewire Initially Pipewire sees the DJ card as a 4-channel sound card and assumes it s surround4.0 (so front and rear left and right). It helpfully expands my stereo signal so that everything goes to the front and rear. So, exactly what I don t want to have happen happens: all my system sounds go out the main speakers (channel 1-2). It was easy to override Wireplumber s ALSA configuration and assign different channel positions. I tried assigning something like a1,a2,fl,fr hoping that Pipewire wouldn t mix things into aux channels that weren t part of the typical surround set. No luck. It did correctly reflect the channels in things like pacmd list sinks so my Pipewire config was being applied. But the sound was still wrong. * I tried turning off channelmix.upmix. That didn t help; that appears to be more about mixing stereo into center, rear and LFE. The basic approach of getting a stream to conform to the output node s channels appears to be hurting me here. I d love any ideas about how I can get this to work. I m sure it s simple I m just missing the right mental model or knowledge of how to configure things. Pipewire Not Talking to Jack I thought I could at least use Pipewire the same way I use Pulse. Namely, I can run a real jackd and connect up Pipewire to that server. According to the wiki, Pipewire can be a Jack client. It s disabled by default, because you need to make sure that Wireplumber is using the real Jack libraries rather than the Pipewire replacements. That s the case on Debian, so I enabled the feature. A Jack device appeared in wpctl status as did a Jack sink. Using jack_lsp on that device showed it was talking to the Jack server and connected to system:playback_*. Unfortunately, it doesn t work. The sink does not show up in pacmd list sinks, and pipewire-pulse gives an error about it not being ready. If I select it as the default sink in wpctl set-default I get no sound at all, at least from Pulse applications. Versions of things This is all on debian, approximately testing/bookworm or newer for the relevant libraries.

comment count unavailable comments

Next.

Previous.